Core Developer @ Hudson River Trading
On 5/4/2023, 5:47:27 PM
This is the second of a three-part series describing the tty subsystem. See the first part describing the subsystem and giving an overview of its architecture here. See the third part describing two relevant buffering data structures here.
Initially, I had thought that these "extra terminal behaviors" were unnecessary and not critical to my understanding and implementation of the terminal in Linux. I was very wrong; not only does line discipline make up many useful and familiar behaviors that create the look and feel of a terminal, but keeping it separate from the tty driver helps preserve the distinction between policy and mechanism in the kernel.
While tty devices are mostly a "dumb" device, acting as a bidirectional channel between keyboard/console (master) and process (slave), there are a number of useful special semantics that have evolved over the years that suit the asymmetric, interactive terminal interface. Some well-known examples include:
Overall, this behavior is called the line discipline, and it describes the behavior ("policy") of the terminal device. Recall from the earlier blog post that the other major component of the tty/terminal subsystem is the terminal driver, which provides an interface to the input and output serial hardware devices ("mechanism"). All operations on a terminal device go through the line discipline interface.
The line discipline interface comprises three functions for normal operation:
receive_buf()
: called by the terminal driver to send input to the line disciplineread()
: called by the slave to read from the line discipline (input) bufferwrite()
: called by the slave to write to the output buffer; should forward writes to the terminal driverAdditionally, the line discipline should be able to receive ioctl
s to change its behavior, i.e., through the termios
interface described below. This interface is defined in struct tty_ldisc_ops
defined in include/linux/tty_ldisc.h
.
The interactions between the line discipline and the tty driver are summarized in the following diagram.
+-----------------+
| process |
+-----------------+
| |
write() | | read()
v v
+-----------------+
| ldisc |
+-----------------+
| ^
write() | | receive_buf()
v |
+-----------------+
| tty driver |
+-----------------+
^ |
| |
| v
+--------+ +--------+
| input | | output |
| serial | | serial |
| device | | device |
+--------+ +--------+
Before going into specifics about line discipline behavior, we should review special ASCII characters, notation, and common keys.
In ASCII, there are 128 characters. Characters 0-31 are special, non-printable characters. Characters 32-127 are printable characters. Any character with the parity bit set (characters 128-255) are not valid ASCII and may be handled normally or filtered out using terminal settings (e.g., ignpar
, inpck
).
In this section we focus on the low 32 characters. Each of these characters can be entered with a control sequence; for example, ASCII 0x01 can be entered using Ctrl+A; we denote this using Emacs notation as ^A. Some keys on the keyboard are mapped to special keys, such as Enter being mapped to ^M. Important special characters are summarized in the below table; a more comprehensive table can be found here.
ASCII | Name | Key |
---|---|---|
0x00 | null | ^@ |
0x03 | break/interrupt | ^C |
0x08 | backspace | ^H |
0x0A | line feed (LF) | ^J |
0x0D | carriage return (CR) | ^M or Enter |
0x1B | escape | ^[ or Esc |
0x7F | delete | ^? or Bksp |
The ^H, ^J, and ^M characters are understood by a terminal console driver; they are commands to move the cursor left, down, and to the beginning of a line, respectively. The ^J character signals the end of a line in canonical mode. The ^? character is used to delete backwards in canonical mode. The ^C character is used to send the SIGTERM signal when isig
is enabled.
There may be some confusion around the Enter (which produces a carriage return rather than a newline character) and Bksp (which produces a delete key rather than a backspace key). I believe it is mostly historical significance but am not too sure. The mixup between ^M (produced by Enter) and ^J (universally understood by Linux to mean end-of-line) is common enough that a common terminal setting exists to convert ^M to ^J called icrnl
.
In canonical mode (a.k.a., cooked mode), special characters may be used to provide editing within a line. Usually these are the erase (default ^? or Bksp) and kill (default ^U) keys, which erase the last character and the whole line, respectively.
Since you can edit a line, a read operation on a terminal in canonical mode will not complete until the end of line is reached (^J is sent). Similarly, no more than one line will be sent for any read command, no matter how many bytes are requested.
The opposite of cooked mode is called raw mode. In raw mode, reads return as soon as there is data (possibly throttled for performance), and the erase and kill characters have no special meaning.
The line discipline uses a 4KB ringbuffer (by default) to manage data. In canonical mode, data is not sent to the application until a line feed (^J) character is written to the input buffer.
One aspect of this behavior is that when a program reads input from a cooked-mode terminal, the read call doesn't finish until the LF character is sent. A call to getchar()
in libc would not instantly return once a character is inputted, unless the character was a line feed; instead, it would read the entire line and return the first byte of the terminal buffer.
Having a fixed-size line editing buffer also means that extra characters are discarded once the buffer is exhausted. If the input buffer is full, future characters are still processed (signals, echoing, etc.) but new characters will be lost. termios(3)
documents this behavior. We can observe this by entering more than 4096 characters of input1 for a program reading from stdin in cooked mode, and checking how many characters are actually received.
Note that this buffer overflow can also happen in raw mode if the buffer is not emptied quickly enough.
Usually, when interacting with a terminal, we are able to see each character that we type. This is called echoing; it works by "echoing" (copying) each byte from the input buffer to the output buffer, so that it gets displayed.
When we type characters into the terminal with echoing enabled, the characters are normally also written onto the output buffer and displayed onto the console. For printable characters, this does exactly what we expect. What happens for non-printable (control characters)?
Control characters will be printed out in Emacs notation (e.g., "^@" for Ctrl+2). The special characters are escaped before being echoed to the output buffer. The slave side receives the unescaped characters, and any special characters written to output buffer are not escaped automatically.
Some control keys will be handled specially in cooked mode and thus not be printed, such as ^?.
Echoing can also be disabled (e.g., when entering passwords) using the termios
interface.
Terminal devices in Linux can be configured using the termios
C interface. This interface exposes the tcgetattr()
/tcsetattr()
functions to fetch and set the terminal configuration via ioctl()
s, respectively. The terminal configuration exists as a set of flags that define the terminal behavior; some sample flags from the termios
interface are shown below:
While the termios interface may be useful when writing a C program that manages terminal properties (e.g., if you are a program like bash
), then using the C interface directly is fine. However, the stty
interface is a useful shell utility to change terminal properties on command. For example, we can enable echoing using stty echo
, disable echoing using stty -echo
, enable raw mode using stty raw
, and enable cooked mode using stty cooked
. There are many more options available to match much of the termios
interface; see the manpages for termios(3)
and stty(1)
.
sh
rather than bash
If you try and experiment with terminal features on your own using the stty
shell command, you may have unexpected results if using the bash
shell. At least, it will be unexpected if you don't understand what bash
does under-the-hood (as I didn't); most of the time, the good ol' Bourne shell sh
will give the expected result.
To give a simple illustration, try entering the following experiments in bash
and sh
. The following experiments are all done in raw mode, by first entering stty raw
Enter into the shell2 3.
whoami
Entercat -A
Enter ^C abc
Bkspcat -A
^J ^C abc
BkspHere are my results:
bash
:$ ^C
$ whoami
jon
$
sh
:$ ^Cwhoami^M
bash
:$ ab
sh
:$ abc^?
bash
:$ cat -A
^C^Caabbcc^?^?
sh
:$ cat -A^M^Cabc^?
bash
:$ cat -A
^C^Caabbcc^?^?
sh
:$ cat -A^J^C^Caabbcc^?^?
Phew! There's a lot of nuance here. Before going through each example, it'll be easier if I provide the overall reason for the differences upfront: bash
changes the terminal settings when prompting the user for a command. That is, it provides nice line-editing features via user-level software, and not via the terminal itself. However, before exec
-ing a program (e.g., cat
, it restores the terminal settings. sh
doesn't provide any custom line editing semantics in the prompt, so we see the truer terminal behavior. To summarize, bash
overrides the terminal settings in the command prompt, while sh
doesn't4; however, both share the same behavior within a program executed by the shell.
Another thing we need to look into is exactly what stty raw
does, since it turns a number of terminal flags. Looking at the manpage for stty(1)
, we see that the raw
option is shorthand for:
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr -icrnl -ixon -ixoff -icanon -opost -isig -iuclc -ixany -imaxbel -xcase min 1 time 0
That's a handful, but the main options we care about are:
Setting just these options rather than stty raw
should provide (almost) identical output. That should be enough to go through these examples.
In the bash
example, ^C still sends the SIGTERM signal despite the -isig flag5. A new prompt is entered, and the command whoami
is entered and executed by pressing Enter, which actually sends a ^M character that bash
translates to ^J despite the -icrnl
flag being set. Note that the following prompt is indented; this is due to the -opost
flag that is also set by stty raw
.
In the sh
equivalent, the story is much simpler. The ^C does not send a signal and is not treated specially. whoami
is entered, followed by a ^M, which is also not treated specially. No command is executed becuase ^J is not sent.
This one is pretty clear. bash
implements its own editing semantics. sh
doesn't, and thus the ^? character that is sent when pressing Bksp is not treated as a special character.
Now, we introduce a subprocess spawned by the shell that will read from the terminal. cat -A
echoes special characters using the Emacs carat notation.
In the bash
version, after pressing Enter we execute teh cat command, and it begins listening for input. The cat command doesn't implement any line editing like bash
, so it simply receives each character from the terminal and echoes it out. Since the terminal is in raw mode, the terminal returns characters one-by-one rather than waiting for the end of the line, hence the repeated characters; it also does not handle ^C and Bksp specially.
In the sh
version, we might expect the same, except for one caveat: the Enter command sends ^M, not ^J, so we do not actually execute the cat
command. Recall that Enter doesn't send a newline character if icrnl
is disabled.
This is almost the same as the previous version, except that we explicitly send ^J rather than ^M/Enter.
I apologize for going into this much depth in this section, but bash
's behavior profoundly confused me at the beginning. My suggestion for messing around with terminal settings is to work in sh
or cat
, both of which will not implement line-editing behavior or change terminal settings.
include/linux/tty_ldisc.h
: Defines critical data structures struct tty_ldisc
and struct tty_ldisc_ops
.drivers/tty/n_tty.c
: The default ldisc implementation.drivers/tty/tty_ldisc.c
: ldisc utility functions and wrapper code.termios(3)
: C API to configure terminal settings using ioctl
s.stty(1)
: Shell command to modify terminal settings; usually simpler than using termios
.N_TTY_BUF_SIZE == 4096
is the default ldisc buffer size.1. 4096 characters is a lot of typing... easier to generate a long text file and copy-paste it into stdin.
2. You can also try them in cooked mode without doing stty raw
beforehand, although the results will probably be expected. Understanding how the shell interacts in raw mode was the difficult part for me.
3. I am following the Emacs notation for control keys to avoid any ambiguity in the examples shown, as the Ctrl+C could look like a sequence of three characters rather than a keyboard combination.
4. If you want to see exactly what bash
does, you can run any of the above examples strace bash
. Look for ioctl
s being sent to the terminal device used to set terminal settings before handling prompt input, and to reset terminal settings before executing a command. When reading the command prompt, bash
enables raw mode and disables echoing for the prompt, and handles the raw-mode input directly.
5. Note that in the command prompt, ^C does not send any signal, since no program is currently being executed by the shell. Instead, it cancels the current prompt. In other words, shell programs set a SIGTERM signal handler to cancel the current prompt. This is not relevant to the question at hand, I just found it interesting because it was not something I had thought about previously.
© Copyright 2023 Jonathan Lam