About Escape Sequences

on 2020-11-12

This is translated by LLM. I haven’t verified the content yet.

Introduction

First, here is an interesting problem:

// escape.c
printf("??ab?c\t?dfrge\rf\tg\n");
printf("??ab?c\t?dfrge\rf\tg");

What should these two lines of code output respectively? When I saw this question, I was confused because characters like \r and \b are rarely used. So, I tested it on my machine.

### Environment: Windows 10 20H2, gcc, powershell ###
f 	gdfrge
f 	gdfrge
### Environment: Ubuntu 20.04, gcc, zsh ###
f?ab?c  gdfrge
f?ab?c  g

Seeing the output, I was like: ??? There are several issues with this output that are hard to explain:

  1. Why are the results different in different environments?
  2. In the Ubuntu environment, why does the output of the second statement lack the last 5 characters?
  3. Divide the string into two parts: before \r and after \r. Calculating based on the position of dfrge, the width of \t in the first part is 2 characters, while in the latter part, the width of \t is 7 characters. How is the width of \t calculated?

Escape Sequences

Let’s first review some of the escape sequences described in the book:

Escape SequencesMeaning
\bBackspace.
\nNewline.
\rCarriage return.
\tHorizontal tab.
\vVertical tab.
\aAlert (ANSI C).
\fForm feed.

Let’s look at these escape sequences individually:

\b is used to move the cursor back one space. However, it is important to note that moving the cursor here does not equal pressing the Backspace key on the keyboard. \b only moves the cursor; the content moved over is not deleted. The subsequent content will be inserted after the cursor, overwriting the original content. Additionally, input in stdin will also overwrite characters after the cursor.

printf("Hi there!\b, ");
printf("welcame!\b\b\b\bo\n");
printf("Please type your name: ____.\b\b\b\b");
scanf("%s",name);
----------------------------------------
Output:
Hi there, welcome!
//Input John
Please type your name: John.

The reason \n and \r are placed together is that different systems handle “line breaks” in files differently, so there is no unified standard for the meaning of these two characters. Windows uses Carriage Return (\r) to move the cursor to the beginning of the line and Line Feed (\n) to move the cursor to the line directly below the current position; Linux simply uses (\n) to represent Return + Newline. If you want to know why there is such a difference, Jeff Atwood, co-founder of Stack Overflow, explains this issue very clearly in this article.

However, for the command line, there doesn’t seem to be such a distinction (the specific reason is unknown and remains to be further tested).

\t is used to move the cursor to the next tab stop. Suppose the width of the tab character in your current environment is w (in a terminal, usually w = 8), and the cursor position is p. Outputting a \t will move the cursor to the n*w-th position, where n is an integer and n*w >= p.

printf("bbbbbbb\ta"); putchar('\n');
printf("bbbbbbbb\ta");
----------------------------------------
Output:
bbbbb   a
bbbbbbbb        a

\v Vertical tab. Basically equivalent to the effect of \n and \t working together. It is said that it appeared to speed up printer printing (?)

\a Plays an alert sound. This control character is quite interesting and can be used to prompt the user.

\f Used to tell the printer to forcibly jump to the next page, but it seems that modern systems do not have this function.


Explanation of the Issues

1. Why are the results different in different environments?

Essentially, escape sequences are just characters. C can only pass these characters into the output stream. As for how to process them, there are many differences due to stream and system environment variations.

For example, when the above code runs under Windows, the result does not contain ?ab?c. This indicates that when Windows processes the highlighted \t in "??ab?c\t?dfrge\rf**\t**g\n", it chooses to use \t to overwrite the content that has already been output, i.e., ?ab?c. Conversely, this phenomenon does not occur in Ubuntu because \t in Ubuntu does not overwrite the already output content, but merely moves the cursor to the next Tab stop.

2. In the Ubuntu environment, why does the second statement’s output lack the last 5 characters?

The Linux terminal seems to only print up to the position where the cursor is located; it does not print characters after the cursor. Testing another string on Ubuntu:

printf("??ab?c\t?dfrge\rf\tg\t");
----------------------------------------
Output:
f?ab?c  gdfrge

On one hand, this shows that \t on Ubuntu does not overwrite characters that have already been output. On the other hand, it shows that the terminal only outputs content before the cursor.

3. How is the width of \t calculated?

Refer to the description of tab stops in the previous section.


ANSI Escape Sequences

We left a question earlier: \f cannot achieve a forced page break. So, if we want to implement a forced page break (clear screen) operation, what should we do?

ANSI escape sequences are a standard for in-band signaling to control cursor location, color, and other options on video text terminals. Certain sequences of bytes, most starting with an ASCII escape character and a bracket character, are embedded into text. The terminal interprets these sequences as commands, rather than as text to display verbatim.

-–Wikipedia

ANSI escape sequences can achieve more functions, including controlling the cursor to move up, controlling the color of output characters, clearing the screen, etc. Here is a complete lookup table. The following example uses some commonly used functions.

//Environment: Ubuntu 20.04, gcc, zsh
//Not sure about whether this will work on windows.
//Windows 10 and windows terminal support ANSI escape sequence
//so if you are using them, this should work properly.
#include <stdio.h>
#include <unistd.h>

int main(){
    printf("Hello, Welcome to this ANSI escape sequence test\n");
    printf("\033[33mNow, we are printing in yellow\n");
    printf("\033[4mYou can also underline sentences.\n");
    printf("\033[0;31mBesides, you can print in red.\n");
    printf("\033[0mHopefully, you understand what's happening.\n");
    sleep(8);
    printf("\033[0m\033[2J\033[H");
    printf("\033[32mTry to input something,press CTRL-D to end input\n");
    while(getchar() != EOF);
    printf("\nYou see, text you typed in are green.");
    return 0;
}

\033 means the Escape character in ASCII encoding. It signals that the meaning of the following content has changed.


Try Yourself

Implementing a Progress Bar with Escape Characters

Here is a simple function to print a progress bar, achieving the effect through cyclic calls.

#include <stdio.h>
#include <unistd.h>

void PrintProgressBar(int progress, int interval, int theme){
    //progress is a number in range [0,100] indicating the progress of a work.
    //interval is the time waited between each function call. Count in seconds.
    //theme is the choice of theme in which the progress bar is presented.
    putchar('\r');
    printf("\033[?25l");
    static int in_frame = 0;
    const int MAXWIDTH = 50;
    const char frame[] = {'|','/','-','\\'};
    const int FRAMELEN = 4;
    int width = progress*MAXWIDTH/100;
    switch (theme){
        case 1:
            // \ | / - and number in the end
            printf("%c  %d%%", frame[in_frame],progress);
            in_frame = (in_frame + 1) % FRAMELEN;
            break;
        
        case 2:
            putchar('[');
            for(int i = 0; i < width; i++){
                putchar('=');
            }
            putchar('>');
            for(int i = width+1; i <= MAXWIDTH; i++){
                putchar(' ');
            }
            putchar(']');
            printf(" %d%%",progress);
        default:
            putchar('[');
            for(int i = 0; i < width; i++){
                putchar('=');
            }
            putchar('>');
            for(int i = width+1; i <= MAXWIDTH; i++){
                putchar(' ');
            }
            putchar(']');
            printf("  %c %d%%", frame[in_frame],progress);
            in_frame = (in_frame + 1) % 4;
    }    
    printf("\033[K");
    sleep(interval);
    //Linux uses line buffer when printing in terminal,which means
    //that you need to tell the terminal when to end the line
    //by sending a newline character, printing enough content to fill up the buffer
    //or getting an input.
    //
    //Since we are always on the same line printing short sentences,we satisify 0 of 3 conditions. 
    //There's no way that computer will print the content in buffer.
    //so we need to use fflush whose function is 
    //telling the computer to print all characters in buffer.
    fflush(stdout);
}


int main(){
    for(int i = 1; i <= 100; i++){
        PrintProgressBar(i,1,0);
    }
    printf("\033[?25h");
    return 0;
}

Dancing Kaomoji (Emoticons)

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(){
    const int syn_size = 2,TIME = 100;
    char syn[][1000] = {"└|``┌|","|┐``|┘"};
    printf("\033[?25l");
    for(int T = 0; T < TIME; T++){
        for (int i = 0; i < syn_size; i++){
            printf("\r%s", syn[i]);
            fflush(stdout);
            sleep(1);

        }
    }
    printf("\033[?25h");
    return 0;
}