Jonathan Lam

Core Developer @ Hudson River Trading


Understanding the tty subsystem: Data structures

On 5/10/2023, 11:12:07 PM

Return to blog

This is part three of the discussion on the tty subsystem (part 1, part 2). Here we cover two essential data structures that are useful for providing efficient and reliable buffering1.

Recall the architecture of the tty subsystem for the input path. When the hardware device receives an input event, the tty driver sends this event to the tty line discipline. The process can then read input from the line discipline.

There are two use cases for buffering. The line discipline may require significant processing, so we may not want to call it directly from the tty driver, which will often be probably in interrupt context. Thus, we require some buffering to quickly store input from the tty driver, and defer the flushing of this buffer to the line discipline outside of interrupt context. This buffering mechanism is implemented using the flip buffer data structure.

We also require a buffer to store data in the line discipline before it is read by the process. To maximize line size, we use a ringbuffer (fixed-size queue) to implement the line buffer. We also need to handle both raw and canonical mode.

Both buffering methods are best-effort2. If the flip buffers don't get a chance to flush to the line discipline before they fill up, then data is lost before it reaches the line discipline. Similarly, if the line buffer is filled at a rate faster than read()s from the slave process empty it, then data will be lost before it is sent to the process3.

Note that the flip-buffering is an optimization used to keep interrupt-context execution short, whereas the line buffer is necessary to buffer data for subsequent calls to read() on the tty device.

Flip buffers

Flip buffers provide a simple best-effort mechanism for a producer-consumer queue, with minimal need for synchronization. The idea is that we use two buffers: one for receiving data, and one for reading data4. Since the producer and consumer operate on separate arrays, synchronization is minimized (the producer and consumer only need to agree on which buffer is the producer/consumer buffer).

Input events fill up the producer buffer; once it is full, the producer buffer is marked as full. If the consumer buffer is empty, then we can swap the producer and consumer buffers, so that new input events will go into the new producer buffer (which was previously the consumer buffer), and the consumer will be notified that data is available to read. If the consumer buffer is not empty, then we cannot swap the buffers yet, and any new data will be lost.

Conversely, the tty ldisc reads from the consumer buffer. Once it is empty, the consumer buffer is marked as empty. If the producer buffer is full, then we can swap the producer and consumer buffers, and the ldisc can continue reading from the new consumer buffer (which was previously the producer buffer).

Under normal operation, we hope that the consumer can read data from the consumer buffer faster than the producer fills up the new producer buffer, so that we don't have data loss. What actually happens depends heavily on the nature of the serial input device (e.g., producing very intermittent data such as a keyboard device, or a higher-frequency serial device), the size of the flip buffers, and the nature of the consumer (e.g., how much processing goes on in the tty ldisc consumer). The process of "notifying" the consumer that data is available is usually done by scheduling the producer, so scheduling latency will also have to be considered.

To improve the reliability of flip-buffering, Linux uses more than two flip-buffers. There exists a pool of flip buffers that may be used. One is always chosen as the producer buffer; once the producer buffer is full, it is pushed onto a queue (linked list) of consumer buffers for the consumer to process. Data is only lost if the flip buffer pool is exhausted.

To see the action of line buffers in action in Linux, it is instructive to look at the flush_to_ldisc() function in drivers/tty/tty_buffer.c. This is the function that asynchronously flushes data to the line discipline when there is data available, and it demonstrates how to walk the buffer chain.

Line buffer

The line discipline maintains a queue to buffer terminal input before it is read. A ringbuffer is used; any incoming data from the flip buffers is stored at the commit head pointer, and the head pointer is then advanced. Data is read from the read tail pointer. If the buffer is full (head == tail-1), no additional data will be written to the buffer. If the buffer is empty (head == tail), then any calls to read() will block until new data comes in.

In raw mode, reads occur between at the tail up to the commit head. In canonical mode, we need additional provisions to handle line-editing. To do this, we have another pointer, called the canonical head pointer. This points to the latest newline character (start of the latest line in the header). Characters before this pointer are finalized, so reads now happen until the read tail reach the canonical head. Characters between the canonical head and the commit (raw-mode) head can be edited using line-editing characters such as C-? (erase last character) and C-U (kill whole line).

It is very instructive to look at the n_tty_receive_buf() and n_tty_read() API functions in drivers/tty/n_tty.c, where the default line discipline is implemented. n_tty_receive_buf() implements the filling of the buffer, and n_tty_read() implements the emptying of the buffer. For example, see the following excerpt, which is called by n_tty_read() in canonical mode:

 * canon_copy_from_read_buf	-	copy read data in canonical mode
 * @tty: terminal device
 * @kbp: data
 * @nr: size of data
 * Helper function for n_tty_read(). It is only called when %ICANON is on; it
 * copies one line of input up to and including the line-delimiting character
 * into the result buffer.
 * Note: When termios is changed from non-canonical to canonical mode and the
 * read buffer contains data, n_tty_set_termios() simulates an EOF push (as if
 * C-d were input) _without_ the %DISABLED_CHAR in the buffer. This causes data
 * already processed as input to be immediately available as input although a
 * newline has not been received.
 * Locking:
 *  * called under the %atomic_read_lock mutex
 *  * n_tty_read()/consumer path:
 *	caller holds non-exclusive %termios_rwsem;
 *	read_tail published
static bool canon_copy_from_read_buf(struct tty_struct *tty,
				     unsigned char **kbp,
				     size_t *nr)
	struct n_tty_data *ldata = tty->disc_data;
	size_t n, size, more, c;
	size_t eol;
	size_t tail, canon_head;
	int found = 0;

	/* N.B. avoid overrun if nr == 0 */
	if (!*nr)
		return false;

	canon_head = smp_load_acquire(&ldata->canon_head);
	n = min(*nr, canon_head - ldata->read_tail);

	tail = ldata->read_tail & (N_TTY_BUF_SIZE - 1);
	size = min_t(size_t, tail + n, N_TTY_BUF_SIZE);

	n_tty_trace("%s: nr:%zu tail:%zu n:%zu size:%zu\n",
		    __func__, *nr, tail, n, size);

	eol = find_next_bit(ldata->read_flags, size, tail);
	more = n - (size - tail);
	if (eol == N_TTY_BUF_SIZE && more) {
		/* scan wrapped without finding set bit */
		eol = find_first_bit(ldata->read_flags, more);
		found = eol != more;
	} else
		found = eol != size;

	n = eol - tail;
	if (n > N_TTY_BUF_SIZE)
		n += N_TTY_BUF_SIZE;
	c = n + found;

	if (!found || read_buf(ldata, eol) != __DISABLED_CHAR)
		n = c;

	n_tty_trace("%s: eol:%zu found:%d n:%zu c:%zu tail:%zu more:%zu\n",
		    __func__, eol, found, n, c, tail, more);

	tty_copy(tty, *kbp, tail, n);
	*kbp += n;
	*nr -= n;

	if (found)
		clear_bit(eol, ldata->read_flags);
	smp_store_release(&ldata->read_tail, ldata->read_tail + c);

	if (found) {
		if (!ldata->push)
			ldata->line_start = ldata->read_tail;
			ldata->push = 0;
		return false;

	/* No EOL found - do a continuation retry if there is more data */
	return ldata->read_tail != canon_head;

Breaking this down...

  1. canon_head = smp_load_acquire(&ldata->canon_head); Get the current (published5) canonical head pointer.
  2. n = min(*nr, canon_head - ldata->read_tail); We assign n to be the number of number of bytes we read. This is the minimum of the requested bytes (*nr) and the available bytes in the line buffer (canon_head - read_tail). Note that the canon_head and read_tail variables are actually monotonic integers and do not wrap around, so the difference should always be positive. We get their positions in the line buffer by using a 0xfff mask (since the line buffer is 4096 bytes).
  3. tail = ldata->read_tail & (N_TTY_BUF_SIZE - 1); Perform the masking as described above to get the tail pointer's position in the line buffer.
  4. size = min_t(size_t, tail + n, N_TTY_BUF_SIZE); The end of the read operation, assuming no wraparound. We will deal with the wraparound shortly.
  5. eol = find_next_bit(ldata->read_flags, size, tail); Get the position of the first newline in the range [tail, size]. Newlines are marked using the ldata->read_flags bitmap. We need to know this because we return at maximum one line at a time in canonical mode. If there is wraparound and we haven't found the end of the line, we look for it in the wrapped portion (this comes a few lines later).
  6. more = n - (size - tail); (size-tail) is the size of the unwrapped bytes to read. more is the size of the wrapped bytes to read.
  7. found = ...; found is true iff a newline is found in the first n bytes. The definition of found depends on whether a newline was found in the unwrapped section or in the wrapped section.
  8. n = eol - tail; If we've found a newline character in the first n bytes, stop reading at that newline character. (If found is false, then eol == size+more and the value of n shouldn't change.)
  9. if (n > N_TTY_BUF_SIZE) n += N_TTY_BUF_SIZE; If there was wraparound, then n will be negative, hence this correction. We check if it is > N_TTY_BUF_SIZE rather than checking if it is less than zero because n is an unsigned integer.
  10. c = n + found; c is the number of bytes to be removed from the buffer, including the newline (if found).
  11. tty_copy(tty, *kbp, tail, n); Perform a ringbuffer copy from tail with length n to the output buffer *kbp.
  12. *kbp += n; Update the position of the pointer in the output buffer.
  13. *nr -= n; Update the number of bytes remaining from the originally requested read size.
  14. if (found) clear_bit(eol, ldata->read_flags); Clear the newline from the bitmap (if found).
  15. smp_store_release(&ldata->read_tail, ldata->read_tail + c); Update the read tail pointer, discarding the next c bytes. Publish this using acquire-release semantics.
  16. return ldata->read_tail != canon_head; Returns true iff there is more data to be read.

In the Linux kernel



1. I wanted to write about this because I had trouble understanding the design and implementation of both of these data structures. I was only able to find vague explanations of them online. Reading the Linux source code gave me a much more concrete understanding of the data structures, and I hope to summarize what I learned here for those in the same position as me.

2. The input path must be best-effort because we cannot the producer can't sleep if the consumer can't keep up. On the output path, the producer can sleep or defer output if the output buffer is currently full.

3. Data loss also happens if the line length is longer than the line buffer size in canonical mode, but this is a byproduct of the canonical mode protocol.

4. This sounds like the multiple buffering technique used in graphics rendering, but the motivation and side-effects are not quite the same. Both involve a separate consumer/producer buffer that allow for parallel producer and consumer action. Multiple buffering is used to reduce visual artifacts due to combining multiple frames in one framebuffer. Flip buffers are used on an inherently serial data stream to reduce latency of the producer code (running in interrupt context). However, loss of coordination between producer and consumer is normal and expected when rendering, causing frame drops; but a slower consumer than producer in the flip buffers causes data loss. Luckily, this is typically tolerable or unlikely for serial devices.

5. An acquire-release memory ordering model is used here for multi-core synchronization.

© Copyright 2023 Jonathan Lam