Terminals and Shells 3: Bits and pieces
This is part of a series covering 'glue' knowledge. This is stuff that may be difficult to find in any normal training material. If you're a new developer or programmer you will hopefully find it useful. I try to explain more of the implementation side if it helps understanding.
This particular part of the series is about terminal and shell usage. I focus mostly on Linux based shells.
Before moving on to other large topics, there are lots of bits and pieces of knowledge I've not found a place for, that I want to cover now.
Permanent PATH
changes, RC files
We've learned how to change our PATH
environment variable in order to make
executables available as commands. But you'll find if you open a new terminal,
that these changes are not persistent.
Every shell program has a way of making these kind of changes apply to every shell you open. For Bash and Zsh (and others) this is done through rc files, where 'rc' stands for 'run commands'. These are specific files containing commands that will run before each shell.
For Zsh, one file ran when you log in to the shell is the ~/.zprofile
file.
Here is what mine looks like on my personal machine:
PATH="/Library/Frameworks/Python.framework/Versions/3.9/bin:${PATH}" export PATH
You can imagine this file as being typed on the terminal before using it. Mine
adds to my PATH
and exports it. For Bash this file is ~/.bash_profile
. These
are the home for any path changes, or any other environment variables. It gets
executed for your 'log-in' shell. You can read about the nuances on stack
overflow.
The value set to PATH
above includes ${PATH}
at the end. This has the effect
of adding to the path, just like x = x + 1
would add to x
.
The other common RC file is ~/.zshrc
for Zsh and ~/.bashrc
for Bash. These
are ran for each interactive shell you open, and is therefore a good place for
things like aliases, which we talk about next.
These types of files typically begin with a dot .
character. This is the Linux
way of making 'hidden' files. ls
by default will not show files starting with
a .
, but you can use ls -a
to show them.
Aliases
Here is what my ~/.zshrc
looks like:
alias gs='git status' alias gd='git diff' alias gdc='git diff --cached'
This is using the alias
built-in to define some 'shortcut' commands. Aliases
are only available in interactive shells, not in scripts. The first line allows
me to type gs
to run git status
. It is pretty much text substitution. If I
type gd --cached
it's effectively the same as typing git diff --cached
.
These are only expanded for the command name, not in the middle of commands:
echo gs
doesn't expand the gs
.
Aliases are not inherited by child shells like environment variables are:
% alias hi='echo hello' # make new alias % hi # try it out hello % zsh # start an shell inside the shell! % hi zsh: command not found: hi
(%
indicates the shell prompt in zsh)
The hi
alias isn't available in the new inner shell, but any exported
environment variables would be. This is why aliases go in the .zshrc
file
(which runs for every interactive shell), and not in .zprofile
(which runs
only for a 'log in' shell).
Special key combinations
^C
You've probably seen Control-C used to terminate running commands:
% sleep 100 ^C
Here I run sleep 100
then shortly after press the key combination of the
Control key, and the C key at the same time. You can see that the shell printed
^C
to the terminal. This is a notation for key combinations. ^
represents
the control key, and C
represents the C key (no surprise there).
Hitting ^C
sends the currently running command's process the interrupt signal
(SIGINT
), generally causing them to finish. SIGINT
can be 'caught' by
processes, allowing them do clean-up before terminating, but they can also
ignore it or handle it in a broken way. This is why ^C
doesn't always work how
you expect. We'll talk more about this when we talk about kill
.
^D
Another useful key combination is ^D
, or Ctrl-D. This sends the 'end of file'
character to the terminal. You can think of it as a way of 'finishing' your
input, but you may have to hit it a couple of times.
(Aside: The low level explanation is that ^D
seems to cause the read
system
call to return when encountered. I couldn't find sources to back this up though.
If there is no extra data, then read
will end up returning zero bytes, which
signals that a stream is 'finished'.)
When talking about ^C
we ran a shell within our shell, by executing the zsh
command. When you do this it looks like not much has changed, but you're
actually now 'one level deeper' in shells. You can tell your shell that you are
finished with it by pressing ^D
with no command half-typed. This will exit out
of your current shell, potentially putting you back in your outer shell.
We can demonstrate with environment variables not being inherited:
% TEST=123 # Note: TEST is NOT exported, so child processes don't see it. % zsh # Enter an inner shell % echo TEST is "$TEST" TEST is # Press ^D, exit the inner shell, back to our outer shell. % echo TEST is $TEST TEST is 123 # Press ^D again, our terminal closes entirely!
At the end of this we pressed ^D
in the outermost shell, and it closed our
terminal, since there's no more outer shell to return to. If I have something
typed on my shell but not yet executed, ^D
doesn't do anything. This is just
how the shell treats half typed commands. You can clear your half-typed command
with ^C
, then do ^D
.
This same behaviour applies to other commands. If you run python3
you get
dropped into a Python REPL (Read-eval-print-loop), which is basically a
shell but where you type Python code. We can exit out of this with ^D
too. A
similar thing happens with node
and the JavaScript REPL.
% python3 Python 3.9.10 >>> print("Hello") Hello >>> ^D % node Welcome to Node.js v16.13.0. > console.log("Hello") Hello undefined > % # Pressed ^D. Back in our shell.
^Z
This is known as the suspend character. It is similar to ^C
in that it
'exits' the currently running command and returns you to the terminal. ^Z
does
not actually try to terminate the executable, it suspends it. This basically
means putting it on pause and not giving it any execution time.
Suspended executables are still processes, and their execution can be resumed.
If you hit ^Z
while running something, you can type the fg
command (a
built-in) to resume it and go back to not having shell access until the
executable completes. fg
stands for foreground.
When treating processes like this, we usually refer to them as jobs. A job is a process that your shell is tracking, the job may currently be suspended, or running in the background while you are doing other shell things.
Jobs
You can see the jobs that the shell is juggling with the jobs
command. The
easiest way to have some jobs going is to use the 'control operator', ie an &
character at the end of a command. The sleep
command helps us demonstrate.
sleep
just sleeps for the provided number of seconds then exits.
% sleep 60 & [1] 85623 % jobs [1] + running sleep 60
Here we've run the sleep
command putting &
on the end. This immediately puts
the command 'in the background', which just means it's running (not suspended
like ^Z
) behind the scenes without stopping you using the shell.
Every job has a job ID, which the jobs
command shows in square brackets.
For our sleep command here it's [1]
, so a job ID of 1. You can bring a job
into the foreground using the fg
command again. fg
optionally takes a job
spec to determine what to bring into the foreground. A job spec starts with
the %
character. An example is %1
, which is a job spec for the job with ID
%%
and%
are shortcuts for the 'current' job (which is the last job put in the background). There are more job specs but these should keep you going.
So with our sleep
we put in the background with job ID 1, we can bring it back
to the foreground with fg %1
. It's also the 'current job', so fg
or fg %%
would all do the same thing. Once it's in the foreground we can do ^C
to
terminate it.
When we used ^Z
, what we did is put the current process in the background and
suspend it. We can use fg
to bring it back to the foreground, or we can use
bg
(background) to tell the process to continue executing in the background.
bg
also takes job specs.
We can use this combination to have a Python REPL open, leave it for a bit to do something in the terminal, then come back to it without reloading:
% python3 Python 3.9.10 >>> a = 123 # Set a variable. # (pressing ^Z) zsh: suspended python3 % echo Hello # do something useful. Hello % fg [1] + continued python3 >>> a 123 # 'a' is still 123.
This can be useful if you quickly need to check something, but don't want to lose progress in whatever else you were doing.
The ps
command
I've talked about processes already. Something I didn't talk about was process IDs. Every process has a unique numerical ID (caveat: per PID namespace, but I'm not going to go into that). This is similar to the job IDs except for the entire system, not just your shell.
Similar to the jobs
command, there is a ps
command (ps
for 'process
status') to list processes. By default this only shows processes for your
current terminal. You can see all processes with extra information with ps -ef
. Most people learn a particular 'incantation' for ps
, ps -ef
is mine.
An example showing just the current shell:
% ps -f UID PID PPID C STIME TTY TIME CMD 501 67197 67196 0 8:44am ttys000 0:00.24 -zsh
We can see the PID
column, this tells us the process ID of the given process.
Kill
I told you that ^C
sends the SIGINT
signal to the process of the current
command. There are more signals that can be sent to processes, and this is
generally done with the kill
command. Running kill <pid>
sends the default
signal, SIGTERM
, to the given process ID (PID).
The default SIGTERM
signal tells a process that it should promptly terminate,
but gives it chance to do any clean up it might want to do, like deleting any
temporary files it created. The behaviour of processes sent SIGTERM
is usually
similar to the behaviour of being sent SIGINT
(like ^C
). The process can
ignore this signal as well.
You can also use job specs with kill, ie things like %1
. kill
is useful
for terminating processes that you no longer want running for whatever reason.
If a process really is messed up, and not responding to SIGTERM
, you can send
SIGKILL
. This never actually reaches the process. It cannot block it or ignore
it. The kernel itself will terminate the process. You can do this with kill -SIGKILL <pid>
. This can cause problems if the process depends on 'lockfiles'
to know it's the only instance of itself running (apt
does this for example).
Wait
If you have put a bunch of processes in the background, and you wish to wait for
them to all be done, the wait
command is exactly that. The wait command
accepts a list of PIDs or job specs, and will wait for all of them to be
complete before exiting itself. If you provide no arguments, it waits on all
running jobs.
This is particularly useful in scripts if you wish to run several independent commands at once, but wait for them to all be complete before continuing.
Source vs execute
Imagine you have an application that you're developing, and it requires a bunch of environment variables to be set up for configuration. You manually type out and export these variables over and over through development, and you rightly get fed up of it.
So you put together a script env.sh
, and make it executable, looking like
this:
export API_BASE_URL="http://localhost:8080/api" export API_DB_CONNECTION="./storage/api.db" export OTHER_SERVICE_URL="http://localhost:8090" # ... etc
You execute it, and realise that because the execution happens in a new process, that those environment variables only apply to that child process, and don't effect your current shell.
This is where sourcing a script comes in. When you source a script it runs it almost as if you had typed it yourself in your own shell, rather than executing it in a child process. This means all environment setting will do what you expect.
You can source a script using the dot command, . ./env.sh
, or use the maybe
more explanatory source ./env.sh
(source
isn't a standard command, but is
common). The file does not need to be executable for you to source it. This is
essentially what happens with your .zshrc
or .bashrc
file.
Pipes
A lot of the power of a shell comes come being able to quickly chain together a lot of simple commands in order to do a complicated one-off task.
For example, I have a Git repository, and I want to know who has authored the most commits on master, I can run the following after some looking up Git commands:
git log --pretty="%an" master | sort | uniq -c | sort -nr | head
For the Rust Git repository this prints:
28091 bors 6724 bors[bot] 5950 Ralph Zahn 5520 Colleen Obrien 5512 Nancy George 5076 Isabella Cuellar 4402 Raymond Ervin 4213 Erica Spain 4158 Stella Evans 3904 Jesse Mount
(names have been replaced by generated names)
This:
- Prints out each commit as a single line, only printing the author name
(
%an
). sort
sorts that list alphabetically.uniq
deduplicates lines that are the same next to each other, and printing the count of duplicates (-c
).- Second
sort
sorts the new list, but this time by the number (-n
) it finds rather than alphabetically, and sort it from largest to smallest (-r
, for reverse). - Prints only the first 10 lines with
head
.
We used git
, sort
and uniq
commands to do this, and we piped (|
is
called a pipe) the output of each command into the next one. There is probably a
way to make Git print this information in one command, but not all commands are
going to have options for what you want. Combining very generic commands can be
incredibly quick and useful.
We can build up this pipe one command at a time to see more clearly what output
we're manipulating. git log --pretty="%an" master
prints out a very long list
like this:
bors bors Ralph Zahn Ralph Zahn Ralph Zahn Ralph Zahn Isabella Cuellar bors Isabella Cuellar bors bors Nancy George bors Isabella Cuellar bors Colleen Obrien (output goes on for a long time)
This is a list of each commit on the master branch, but only printing the
author, in chronological order. We want to count how many times each name
appears, the first thing we do is sort the list alphabetically by adding | sort
:
0e4ef622 0x0G 0x1793d1 0xd4d 0xflotus 0xflotus 0xflotus 0xrgb 0yoyoyo 0yoyoyo (output goes on for a long time)
Clearly there's some interesting authors that have made it into the Rust
codebase. You can see now that 0xflotus
has made 3 commits to master, because
there are three lines with that author. We use uniq
to deduplicate the list,
adding the -c
flag so that it prints how many lines with the same content were
next to each other.
1 0e4ef622 1 0x0G 1 0x1793d1 1 0xd4d 3 0xflotus 1 0xrgb 8 0yoyoyo 20 1000teslas 16 1011X 4 111 (again goes on)
We do a final sort -nr
to sort by the number it finds on each line, in reverse
(largest first) order. We add a final head
to only print the top 10 authors,
giving us:
28091 bors 6724 bors[bot] 5950 Ralph Zahn 5520 Colleen Obrien 5512 Nancy George 5076 Isabella Cuellar 4402 Raymond Ervin 4213 Erica Spain 4158 Stella Evans 3904 Jesse Mount
Let's take a step back and talk about what we're actually piping around here.
Standard in, out and error
Conventionally processes have what are called standard-in, standard-out,
and standard-error. These refer to streams of bytes that are input to or
output from the process. Standard-in (stdin
), is an input to the process.
Standard-out (stdout
) is the output for the 'expected' behaviour of the
program if nothing is wrong. Finally standard-error (stderr
), is for
'unexpected' output like errors.
When we pipe one process to another, like cat fruits.txt | grep apple
, we
are streaming the stdout
of the first command into the stdin
of the second.
A lot of commands that take files as an argument will accept input from stdin
instead. This is up to the command to implement. Sometimes you might want
stderr
to also be piped to the next command, this can be achieved with
redirections.
When calling functions like console.log
in JavaScript/Node, or print
in
Python, these are outputting to stdout
. You can write to stderr
in Node with
console.error
, or even the more low level process.stderr.write
.
Exit status
Every process, once finished, has an 'exit status'. This is a number. Zero
represents 'all good', whereas anything non-zero indicates some sort of issue or
other piece of information. The exit status of the last command is stored in
$?
:
% cat file.txt hello % echo $? 0 % cat nonexistant-file cat: nonexistant-file: No such file or directory % echo $? 1
This can be useful in scripts. Zero being 'success' might seem a bit weird. Generally a programming language treats 0 as false and any other number as true. It makes a bit of sense here since generally a command only has one way to succeed, but many ways to fail.
It's worth noting that a command in a pipeline failing does not stop the pipeline, and the pipeline will even mask the failure of a command part way through it. This can be changed with pipefail for bash.
Chaining commands
You can run multiple commands one after the other in one line:
command1; command2; command3
This runs each command regardless of if any fail. You can chain them only if the previous one succeeds (exit code 0):
command1 && command2 && command3
You can chain them only if the previous one fails (exit code not 0):
command1 || command2 || command3
You can use combinations of the above. These do not pipe input/output between commands.
And more
There are still lots of little things to talk about, but this is long enough for now. We will talk about command substitution, process substitution, brace expansion, and more.