Getting the ^D
What does Ctrl-D
do when typed into a terminal? The typical and unsatisfying
answer is it sends end-of-file (EOF) to the terminal. But what is EOF exactly?
What does this trigger? Where in the immense stack of code involved is the
behaviour found?
This article is in several parts. Each part answers the question, with later parts going into more detail and being more pedantic. We start with just what it does from the perspective of the user, move on to a more detailed explanation, and finally dig into the source code of the Linux kernel.
The 'short' answer
If you just want the short answer: ^D
sends a character called 'end of
transmission' to the terminal. This indicates to the foreground process that
input can be read, and wakes that process up.
Why does the process need waking up? Because the kernel will suspend a process when it tries to read input with none to be read.
A typical flow would be
- You run some command, creating a process.
- This process attempts to read input and gets suspended by the kernel, because there is none.
- You type some input, each character being sent to the kernel by the terminal. this does not go to the process yet.
- You press
^D
and the terminal sends the 'end of transmission' character to the kernel. - The kernel makes the data ready to be read by the process, and wakes it up.
- The process resumes and the process receives the input.
When a user types ^D
they're often trying to exit some command or shell. The
process is exactly the same as above, except for (3): no input is typed.
This means that the reading process receives zero bytes of data. This is the condition used to indicate the end of file. Processes typically finish executing at this point.
You can see this behaviour if you run cat
with no arguments. With cat
running, type
without pressing enter. You can see cat
has not yet printed what you have
typed (you only see what you have typed yourself). If you press ^D
, cat
will
wake up, resume its reading, then print what you typed. If you press ^D
without typing anything more, cat
reads zero bytes and takes that to mean
there will never be more input and exits.
A longer answer
There are a lot of assumptions and simplifications in the above section. This section will be more pedantic and detailed.
The character
I said that the 'end of transmission' character is sent to the kernel by the
terminal. What is this character? It is part of the ASCII character set, which
is a mapping of common English characters to the numbers 0-127. ASCII also
includes control
codes, which are
special non-printed characters for things like a new line, a tab, backspace, and
even a bell sound. The character with a numeric value of 4 is our end of
transmission character, sometimes called EOT
.
This is what is sent to the terminal when you press ^D
. In this sense it is a
character, just not an end of file character. The reason that it's D
and
not some other letter is because D
is the fourth letter of the alphabet. ^A
sends ASCII value 1, ^B
ASCII value two, etc. This is potentially due to
old mechanical keyboards1.
This is partly why there is confusion about EOF
being a character. ^D
does
send a character, but not EOF
. Another reason for the confusion is the library
function getc
('get
character'), which has a synopsis
of:
int getc(FILE *stream)
is equivalent tofgetc()
...
int fgetc(FILE *stream)
reads the next character from stream and returns it as an unsigned char cast to an int, orEOF
on end of file or error.
Because this function is called 'get character' people assume everything it
returns must be a character, hence EOF
must be a character. The reality is
that it is indicating an error, not returning a character. It does not have
a value within 0-127 as an ASCII character would2.
Okay, so EOT
gets sent to the kernel. What does the kernel do with it? For
that we need to talk about 'line discipline'.
Line discipline
Line discipline is a part of the Linux kernel that controls how input from a terminal gets to the foreground process in a shell (meaning the currently running command, or the shell itself). Let's quickly jump into the difference between a terminal and shell.
A terminal is software or hardware that accepts input from a user and displays output. This is the TTY or teletype. It used to be real hardware, but typically is now just a software program.
A shell is the software program that receives input from and produces output for the terminal. Examples of shells are Bash, Zsh, and Fish. This is the part that turns commands into actual execution.
There are several glue layers between terminals and shells implemented in the
kernel. One of these layers is line discipline, there is a default/fallback
implementation of this as part of the terminal/TTY driver called n_tty
. It
exists to make application developers' lives easier, and users' lives more
consistent.
The n_tty
default line discipline has two modes: raw and canonical. In raw
mode the kernel gives processes fine-grained access to input, whereas in
canonical mode the kernel waits for the user to finish editing a line before
providing input to the process. Most processes will read from the terminal in
canonical mode. Raw mode is mostly used by applications like shells and text
editors that require exact control over what is shown.
In canonical mode, when a program reads from input, it will only receive input
once the user has sent ^D
or a newline. This allows the kernel itself to
handle the user editing the current line on behalf of the process. A user might
move left and right, backspace, delete, and add new characters before finally
hitting enter. The kernel handling this is convenient for program developers, as
they don't each have to implement line editing features.
^D
^D
is almost the same as pressing enter. The difference is that the character
is never seen by the reading process; the kernel swallows it. When pressing
enter, the produced newline is sent to the process.
A process reads input by calling the read
system call (syscall) on standard
input (stdin
). A syscall is a function call into the kernel. This read
is
what indicates the end of file condition by returning zero bytes. This is
impossible when pressing enter, as the newline will always be there, meaning at
least one character is returned. With ^D
, it's possible for read
to return
zero bytes because ^D
is dropped by the kernel.
It's worth noting the process has to be programmed to stop if read
returns no
bytes. A process could carry on regardless, or not read stdin
at all. In Bash,
if you type half a command and press ^D
repeatedly, nothing happens. Bash is
ignoring the end of file condition3. It only exits if you press ^D
with no
command typed.
Digging into the kernel
This final section will dig into the exact code in the Linux kernel where this behaviour occurs, and round off any assumptions left standing in the previous sections.
I will explain a bit about what the kernel is, dig into the read
syscall, and
talk about the terminal driver.
But first, you should understand that in Linux, 'everything is a file'. Almost.
Terminals have a file associated with them. These files can be created, read and
deleted almost like 'regular' files. This is why the read
syscall is important
for us to explore.
The kernel
Almost all interaction between processes and the rest of the world is mediated by the kernel. At any given time a process may be in kernel mode or user mode.
Kernel mode means the kernel is currently executing code; this handles interaction with hardware such as reading/writing files on disk, sending data over the network, and sending sound to your speakers. User mode is pretty much everything else. All your code runs in this mode, even as root4.
This mediation by the kernel to the outside world is through system calls (or
syscalls). These are functions5 a program can call that transition from
user mode to kernel mode to perform some kernel action. For example, opening a
file is through the open
syscall, and reading
from that file is through the already mentioned read
syscall. If you're polite,
you'll close
it.
Let's see what the read
syscall looks like.
read
syscall
In the Linux codebase you can find the read
syscall in fs/read_write.c
:
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count) { return ksys_read(fd, buf, count); }
The SYSCALL_DEFINE3
is a macro that expands into some quite complex
macro-magic
that I won't go into (different architectures can redefine this macro, so it
varies). The macro takes a list of arguments starting with the name of the
syscall followed by the type then name of each syscall argument. In this case it
simply calls
ksys_read
.
This in turn calls
vfs_read
,
which trimmed down looks like:
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos) { ssize_t ret; // snip if (file->f_op->read) ret = file->f_op->read(file, buf, count, pos); else if (file->f_op->read_iter) ret = new_sync_read(file, buf, count, pos); else ret = -EINVAL; // snip return ret; }
This is where following the trail gets more tricky. The next key part here is
the call to file->f_op->read()
. This is some good old C-style object oriented
programming. This read
is actually a function pointer. The exact function
that gets called depends on how the f_op
field was set up, and the way it is
set up is determined by the type of file that it is. This is very similar to
interfaces and their implementations in high-level languages.
The type for the f_op
field is rather
complex,
but represents all of the operations you can perform on a file in Linux. Here is
a cut down version:
struct file_operations { ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); int (*open) (struct inode *, struct file *); // snip };
You can see the read
field here that was called by vfs_read
. A naive
search of the Linux codebase shows a conservative 1500 variables implementing
this file_operations
type. Lots of types of files...
The one we're interested in, the one that is for terminals, is in
n_tty.c
:
static const struct file_operations tty_fops = { .read_iter = tty_read, .write_iter = tty_write, .open = tty_open, // snip };
Here we can see the read
points to the normal tty_read
function. So while a
terminal isn't what you'd think of as a 'file', it exposes itself as one. In
Linux it's better to think of a file as an abstraction for anything that can
sensibly implement the file operations we saw above. This gives
us a good point to jumping into terminals.
TTY driver
The terminal/teletype/TTY driver exists in drivers/tty
of the kernel source.
We saw that tty_read
is the function that gets called when a process reads
standard input. This turns out to be only half the story.
If you follow the code for tty_read
it is ultimately reading a pre-existing
buffer of data. This buffer fills as input is typed by the user by a
different set of functions.
Looking at the tty_read
side, to avoid showing masses of source code, I'll
just list the path through the code:
tty_read
callsfile_tty(file)
which pulls out atty_struct*
from the file object.- This
tty_struct
has anldisc
field of typetty_ldisc*
.ldisc
means line discipline. ldisc
has anops
field with typetty_ldisc_ops*
, similar to the file operations we saw already.- The
n_tty
implementation of this isn_tty_ops
, which has aread
function pointer set ton_tty_read
. This is the point where other line disciplines could instead be chosen. Thisops
field would point to a different implementation.
n_tty_read
makes received input from the terminal available to the caller of
read
by copying it to a kernel buffer (a buffer only accessible in kernel
mode). This data ultimately ends up in the buffer provided by the caller to
read
. If there is no data to be read, this also puts the current process to
sleep6.
This doesn't explain how the data actually got in the TTY buffer. This is from a
different operation, receive_buf
.
The other side that actually fills the buffer is much harder to track down.
The tty_ldisc_ops
mentioned earlier with its read
function also has a
receive_buf
function. It is this function that gets called as a result of a
user typing into a terminal. It is also where the majority of the line
discipline logic gets handled.
For n_tty
:
receive_buf
points ton_tty_receive_buf
,- which calls
n_tty_receive_buf_common
, - which calls
__receive_buf
, - which calls
n_tty_receive_buf_standard
.
And finally things are relevant to ^D
. This function looks at each character
received, and looks up in a configured bitmap if the character is 'special',
meaning ones that have actions beyond just adding a character to the input. This
includes things like sending SIGINT
with ^C
, backspace, left, right, delete
and our ^D
.
This bitmap that informs the kernel if a character is special can be different
per terminal. Our assumption that the EOT character causes a read
call to
return breaks down here. The character to cause it actually depends on this
bitmap and another field of the TTY structure: tty->termios->c_cc
. This is an
array of characters that is also configurable per TTY. Here cc
refers to
control characters and generally maps to its equivalent ASCII control codes.
When a character is received and determined to be special, this c_cc
array is
checked. The index that the character is found in determines what action the
kernel will take. ^D
only performs EOF because these fields are set up that
way. The element with index 4 in this array happens to be the one that indicates
end of file, and it so happens that EOT (ASCII value 4) is typically the value
of this element. In theory any character could be set up to trigger EOF.
When in canonical mode, the following executes for a special character:
if (c == EOF_CHAR(tty)) { c = __DISABLED_CHAR; goto handle_newline; }
EOF_CHAR
is a macro that returns the element of the c_cc
array representing
the EOF (more potential for confusion about being a character).
This sets the currently read character (in our case ^D
) to a special disabled
character used later, and jumps to the handle_newline
label7.
This looks like:
handle_newline: set_bit(ldata->read_head & (N_TTY_BUF_SIZE - 1), ldata->read_flags); put_tty_queue(c, ldata); smp_store_release(&ldata->canon_head, ldata->read_head); kill_fasync(&tty->fasync, SIGIO, POLL_IN); wake_up_interruptible_poll(&tty->read_wait, EPOLLIN | EPOLLRDNORM);
The two key bits here for us are:
put_tty_queue
, andkill_fasync
.
The put_tty_queue
simply adds the character to the buffer ready for a read
syscall. It does this unconditionally, so the disabled character does end up
written to this buffer. This is necessary, as the read
side needs some
indication that it should stop reading once it gets to that point. This
indication is the disabled character.
The kill_fasync
ultimately causes the process listening to the terminal inputs
to wake up telling it data is available7. The process then re-enters the
read
syscall it slept on, reads the data, and does whatever it needs with it.
This completes our journey for ^D
.
Ending remarks
We started with a very terminal-user focused explanation of ^D
, and got
progressively deeper into the kernel focused explanation, quite frankly
stretching the limits of my understanding. I've tried to be accurate where I'm
sure, and vague where I'm not.
Raw mode line discipline is something I only touched on. This can be set up so that processes receive input immediately, or after a certain amount of time, or a certain number of characters typed. The details can be found in termios(3).
This mode is used by, among others, Bash and Vim to get better control over input, where the kernel's canonical line discipline isn't sufficient. Bash uses the readline library to do this.
My hope is this explains ^D
in sufficient detail for anyone. Though I'd love
to be able to trace this from physical keypress on a keyboard, it's beyond my
abilities at the moment!
Cheers.
Further reading:
Footnotes
-
rsclient on Reddit proposed this theory in this comment. The control key on his their old mechanical keyboard would supress the top bits of the character, making the ASCII value for D, 0x44, turn into 0x04. ↩
-
getc
returns a single byte, so values between 0-255, when returning an actual character. It is not limited to ASCII. ↩ -
Bash actually parses input differently, talked about briefly at the end of the article. It is using raw mode. The point stands that programs could ignore this zero byte condition. ↩
-
Unless you're a kernel developer, in which case you know the difference anyway. ↩
-
Syscalls are only sort of functions. Programs typically call them indirectly through a wrapper such as glibc, and there is typically a CPU instruction to switch from user mode to kernel mode when calling the syscall. ↩
-
Unfortunately I could not follow the kernel code closely enough to show the exact mechanism that puts the process to sleep. Happy for someone to point it out! ↩
-
If you're not familiar with
goto
and labels,goto
essentially lets execution leap to the labelled part of the function. It is used extensively in C for things like error handling. Try/catch fills a similar purpose in other languages, butgoto
is far more flexible. ↩ ↩2