Owen Gage

Getting the ^D

What does Ctrl-D do when typed into a terminal? The typical and unsatisfying answer is it sends end-of-file (EOF) to the terminal. But what is EOF exactly? What does this trigger? Where in the immense stack of code involved is the behaviour found?

This article is in several parts. Each part answers the question, with later parts going into more detail and being more pedantic. We start with just what it does from the perspective of the user, move on to a more detailed explanation, and finally dig into the source code of the Linux kernel.

The 'short' answer

If you just want the short answer: ^D sends a character called 'end of transmission' to the terminal. This indicates to the foreground process that input can be read, and wakes that process up.

Why does the process need waking up? Because the kernel will suspend a process when it tries to read input with none to be read.

A typical flow would be

  1. You run some command, creating a process.
  2. This process attempts to read input and gets suspended by the kernel, because there is none.
  3. You type some input, each character being sent to the kernel by the terminal. this does not go to the process yet.
  4. You press ^D and the terminal sends the 'end of transmission' character to the kernel.
  5. The kernel makes the data ready to be read by the process, and wakes it up.
  6. The process resumes and the process receives the input.

When a user types ^D they're often trying to exit some command or shell. The process is exactly the same as above, except for (3): no input is typed.

This means that the reading process receives zero bytes of data. This is the condition used to indicate the end of file. Processes typically finish executing at this point.

You can see this behaviour if you run cat with no arguments. With cat running, type without pressing enter. You can see cat has not yet printed what you have typed (you only see what you have typed yourself). If you press ^D, cat will wake up, resume its reading, then print what you typed. If you press ^D without typing anything more, cat reads zero bytes and takes that to mean there will never be more input and exits.

A longer answer

There are a lot of assumptions and simplifications in the above section. This section will be more pedantic and detailed.

The character

I said that the 'end of transmission' character is sent to the kernel by the terminal. What is this character? It is part of the ASCII character set, which is a mapping of common English characters to the numbers 0-127. ASCII also includes control codes, which are special non-printed characters for things like a new line, a tab, backspace, and even a bell sound. The character with a numeric value of 4 is our end of transmission character, sometimes called EOT.

This is what is sent to the terminal when you press ^D. In this sense it is a character, just not an end of file character. The reason that it's D and not some other letter is because D is the fourth letter of the alphabet. ^A sends ASCII value 1, ^B ASCII value two, etc. This is potentially due to old mechanical keyboards1.

This is partly why there is confusion about EOF being a character. ^D does send a character, but not EOF. Another reason for the confusion is the library function getc ('get character'), which has a synopsis of:

int getc(FILE *stream) is equivalent to fgetc() ...

int fgetc(FILE *stream) reads the next character from stream and returns it as an unsigned char cast to an int, or EOF on end of file or error.

Because this function is called 'get character' people assume everything it returns must be a character, hence EOF must be a character. The reality is that it is indicating an error, not returning a character. It does not have a value within 0-127 as an ASCII character would2.

Okay, so EOT gets sent to the kernel. What does the kernel do with it? For that we need to talk about 'line discipline'.

Line discipline

Line discipline is a part of the Linux kernel that controls how input from a terminal gets to the foreground process in a shell (meaning the currently running command, or the shell itself). Let's quickly jump into the difference between a terminal and shell.

A terminal is software or hardware that accepts input from a user and displays output. This is the TTY or teletype. It used to be real hardware, but typically is now just a software program.

A shell is the software program that receives input from and produces output for the terminal. Examples of shells are Bash, Zsh, and Fish. This is the part that turns commands into actual execution.

There are several glue layers between terminals and shells implemented in the kernel. One of these layers is line discipline, there is a default/fallback implementation of this as part of the terminal/TTY driver called n_tty. It exists to make application developers' lives easier, and users' lives more consistent.

The n_tty default line discipline has two modes: raw and canonical. In raw mode the kernel gives processes fine-grained access to input, whereas in canonical mode the kernel waits for the user to finish editing a line before providing input to the process. Most processes will read from the terminal in canonical mode. Raw mode is mostly used by applications like shells and text editors that require exact control over what is shown.

In canonical mode, when a program reads from input, it will only receive input once the user has sent ^D or a newline. This allows the kernel itself to handle the user editing the current line on behalf of the process. A user might move left and right, backspace, delete, and add new characters before finally hitting enter. The kernel handling this is convenient for program developers, as they don't each have to implement line editing features.

^D

^D is almost the same as pressing enter. The difference is that the character is never seen by the reading process; the kernel swallows it. When pressing enter, the produced newline is sent to the process.

A process reads input by calling the read system call (syscall) on standard input (stdin). A syscall is a function call into the kernel. This read is what indicates the end of file condition by returning zero bytes. This is impossible when pressing enter, as the newline will always be there, meaning at least one character is returned. With ^D, it's possible for read to return zero bytes because ^D is dropped by the kernel.

It's worth noting the process has to be programmed to stop if read returns no bytes. A process could carry on regardless, or not read stdin at all. In Bash, if you type half a command and press ^D repeatedly, nothing happens. Bash is ignoring the end of file condition3. It only exits if you press ^D with no command typed.

Digging into the kernel

This final section will dig into the exact code in the Linux kernel where this behaviour occurs, and round off any assumptions left standing in the previous sections.

I will explain a bit about what the kernel is, dig into the read syscall, and talk about the terminal driver.

But first, you should understand that in Linux, 'everything is a file'. Almost. Terminals have a file associated with them. These files can be created, read and deleted almost like 'regular' files. This is why the read syscall is important for us to explore.

The kernel

Almost all interaction between processes and the rest of the world is mediated by the kernel. At any given time a process may be in kernel mode or user mode.

Kernel mode means the kernel is currently executing code; this handles interaction with hardware such as reading/writing files on disk, sending data over the network, and sending sound to your speakers. User mode is pretty much everything else. All your code runs in this mode, even as root4.

This mediation by the kernel to the outside world is through system calls (or syscalls). These are functions5 a program can call that transition from user mode to kernel mode to perform some kernel action. For example, opening a file is through the open syscall, and reading from that file is through the already mentioned read syscall. If you're polite, you'll close it.

Let's see what the read syscall looks like.

read syscall

In the Linux codebase you can find the read syscall in fs/read_write.c:

SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
  return ksys_read(fd, buf, count);
}

The SYSCALL_DEFINE3 is a macro that expands into some quite complex macro-magic that I won't go into (different architectures can redefine this macro, so it varies). The macro takes a list of arguments starting with the name of the syscall followed by the type then name of each syscall argument. In this case it simply calls ksys_read. This in turn calls vfs_read, which trimmed down looks like:

ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
{
    ssize_t ret;
    // snip

    if (file->f_op->read)
        ret = file->f_op->read(file, buf, count, pos);
    else if (file->f_op->read_iter)
        ret = new_sync_read(file, buf, count, pos);
    else
        ret = -EINVAL;
    // snip
    return ret;
}

This is where following the trail gets more tricky. The next key part here is the call to file->f_op->read(). This is some good old C-style object oriented programming. This read is actually a function pointer. The exact function that gets called depends on how the f_op field was set up, and the way it is set up is determined by the type of file that it is. This is very similar to interfaces and their implementations in high-level languages.

The type for the f_op field is rather complex, but represents all of the operations you can perform on a file in Linux. Here is a cut down version:

struct file_operations {
    ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
    ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
    int (*open) (struct inode *, struct file *);
    // snip
};

You can see the read field here that was called by vfs_read. A naive search of the Linux codebase shows a conservative 1500 variables implementing this file_operations type. Lots of types of files...

The one we're interested in, the one that is for terminals, is in n_tty.c:

static const struct file_operations tty_fops = {
    .read_iter	= tty_read,
    .write_iter	= tty_write,
    .open		= tty_open,
    // snip
};

Here we can see the read points to the normal tty_read function. So while a terminal isn't what you'd think of as a 'file', it exposes itself as one. In Linux it's better to think of a file as an abstraction for anything that can sensibly implement the file operations we saw above. This gives us a good point to jumping into terminals.

TTY driver

The terminal/teletype/TTY driver exists in drivers/tty of the kernel source. We saw that tty_read is the function that gets called when a process reads standard input. This turns out to be only half the story.

If you follow the code for tty_read it is ultimately reading a pre-existing buffer of data. This buffer fills as input is typed by the user by a different set of functions.

Looking at the tty_read side, to avoid showing masses of source code, I'll just list the path through the code:

  1. tty_read calls file_tty(file) which pulls out a tty_struct* from the file object.
  2. This tty_struct has an ldisc field of type tty_ldisc*. ldisc means line discipline.
  3. ldisc has an ops field with type tty_ldisc_ops*, similar to the file operations we saw already.
  4. The n_tty implementation of this is n_tty_ops, which has a read function pointer set to n_tty_read. This is the point where other line disciplines could instead be chosen. This ops field would point to a different implementation.

n_tty_read makes received input from the terminal available to the caller of read by copying it to a kernel buffer (a buffer only accessible in kernel mode). This data ultimately ends up in the buffer provided by the caller to read. If there is no data to be read, this also puts the current process to sleep6.

This doesn't explain how the data actually got in the TTY buffer. This is from a different operation, receive_buf.

The other side that actually fills the buffer is much harder to track down. The tty_ldisc_ops mentioned earlier with its read function also has a receive_buf function. It is this function that gets called as a result of a user typing into a terminal. It is also where the majority of the line discipline logic gets handled.

For n_tty:

  1. receive_buf points to n_tty_receive_buf,
  2. which calls n_tty_receive_buf_common,
  3. which calls __receive_buf,
  4. which calls n_tty_receive_buf_standard.

And finally things are relevant to ^D. This function looks at each character received, and looks up in a configured bitmap if the character is 'special', meaning ones that have actions beyond just adding a character to the input. This includes things like sending SIGINT with ^C, backspace, left, right, delete and our ^D.

This bitmap that informs the kernel if a character is special can be different per terminal. Our assumption that the EOT character causes a read call to return breaks down here. The character to cause it actually depends on this bitmap and another field of the TTY structure: tty->termios->c_cc. This is an array of characters that is also configurable per TTY. Here cc refers to control characters and generally maps to its equivalent ASCII control codes.

When a character is received and determined to be special, this c_cc array is checked. The index that the character is found in determines what action the kernel will take. ^D only performs EOF because these fields are set up that way. The element with index 4 in this array happens to be the one that indicates end of file, and it so happens that EOT (ASCII value 4) is typically the value of this element. In theory any character could be set up to trigger EOF.

When in canonical mode, the following executes for a special character:

if (c == EOF_CHAR(tty)) {
  c = __DISABLED_CHAR;
  goto handle_newline;
}

EOF_CHAR is a macro that returns the element of the c_cc array representing the EOF (more potential for confusion about being a character). This sets the currently read character (in our case ^D) to a special disabled character used later, and jumps to the handle_newline label7.

This looks like:

handle_newline:
  set_bit(ldata->read_head & (N_TTY_BUF_SIZE - 1), ldata->read_flags);
  put_tty_queue(c, ldata);
  smp_store_release(&ldata->canon_head, ldata->read_head);
  kill_fasync(&tty->fasync, SIGIO, POLL_IN);
  wake_up_interruptible_poll(&tty->read_wait, EPOLLIN | EPOLLRDNORM);

The two key bits here for us are:

  1. put_tty_queue, and
  2. kill_fasync.

The put_tty_queue simply adds the character to the buffer ready for a read syscall. It does this unconditionally, so the disabled character does end up written to this buffer. This is necessary, as the read side needs some indication that it should stop reading once it gets to that point. This indication is the disabled character.

The kill_fasync ultimately causes the process listening to the terminal inputs to wake up telling it data is available7. The process then re-enters the read syscall it slept on, reads the data, and does whatever it needs with it. This completes our journey for ^D.

Ending remarks

We started with a very terminal-user focused explanation of ^D, and got progressively deeper into the kernel focused explanation, quite frankly stretching the limits of my understanding. I've tried to be accurate where I'm sure, and vague where I'm not.

Raw mode line discipline is something I only touched on. This can be set up so that processes receive input immediately, or after a certain amount of time, or a certain number of characters typed. The details can be found in termios(3).

This mode is used by, among others, Bash and Vim to get better control over input, where the kernel's canonical line discipline isn't sufficient. Bash uses the readline library to do this.

My hope is this explains ^D in sufficient detail for anyone. Though I'd love to be able to trace this from physical keypress on a keyboard, it's beyond my abilities at the moment!

Cheers.

Further reading:

Footnotes

  1. rsclient on Reddit proposed this theory in this comment. The control key on his their old mechanical keyboard would supress the top bits of the character, making the ASCII value for D, 0x44, turn into 0x04.

  2. getc returns a single byte, so values between 0-255, when returning an actual character. It is not limited to ASCII.

  3. Bash actually parses input differently, talked about briefly at the end of the article. It is using raw mode. The point stands that programs could ignore this zero byte condition.

  4. Unless you're a kernel developer, in which case you know the difference anyway.

  5. Syscalls are only sort of functions. Programs typically call them indirectly through a wrapper such as glibc, and there is typically a CPU instruction to switch from user mode to kernel mode when calling the syscall.

  6. Unfortunately I could not follow the kernel code closely enough to show the exact mechanism that puts the process to sleep. Happy for someone to point it out!

  7. If you're not familiar with goto and labels, goto essentially lets execution leap to the labelled part of the function. It is used extensively in C for things like error handling. Try/catch fills a similar purpose in other languages, but goto is far more flexible. 2