Network interfaces for developers
If you look for information about network interfaces in Linux, you tend to get a lot of results from the perspective of system or network administrators. But there are things a software developer would find useful to know about day-to-day. I attempt to explain these here. First I'll explain at a high level, then in more gritty detail with Linux source code to back me up.
This is going to focus on running servers, in particular HTTP servers. I'll mostly stick to Linux and IPv4 for simplicity. Other operating systems have close equivalents, and IPv6 is similar.
So what are these useful things to know?
Sockets
To create a server you create a socket for communication. You bind this to an IP address, then listen on that socket to accept incoming connections, eg HTTP requests.
Binding
You can only bind to addresses the kernel thinks correspond to itself. You can't just bind to the IP address of some random website.
Network interfaces
Network interfaces are Linux's way of representing where network packets can enter and leave the system. You almost certainly have several interfaces. They can correspond to physical hardware like your network card, or virtual software-defined interfaces.
There is a loopback interface which handles 127.0.0.0/8
addresses (this is
CIDR notation),
meaning 127.*.*.*
like 127.0.0.1
. These are private to the current host. No
packets on these addresses leave the host or can enter from outside it1.
Linux has network namespaces which isolate sets of network interfaces from each
other. Docker leverages these namespaces. This means that a Docker container has
its own network interfaces independent of the host, including it's own loopback.
This is why binding to 127.0.0.1
in your Docker container isn't accessible
from the host.
0.0.0.0
When binding, there is a special address that can be used, 0.0.0.0
, which can
receive packets arriving on any available network interface. This is typically
used when you want the server accessible outside of the host.
In more detail
Let's talk about these topics in more detail. It isn't a one-to-one mapping with the titles above but covers the same ground.
Creating a server
If you want to create a server without higher-level libraries or frameworks, you need to interact with the kernel via system calls (syscalls). Beej's Guide to Network Programming is an excellent guide on doing just that.
There are only a few system calls required. First you need to make a socket
with the socket
syscall. A socket is a very generic object in Linux for communication between
things. It looks a lot like a file in how you interact with it. You can read
from it, write to it, and much more. For example, a socket can be created to
make HTTP requests, send UDP packets, communicate with the Docker daemon, and
even configure network interfaces. It can also be a way to receive TCP
connections, and therefore, act as a server.
To make our socket into a server, we need to tell the kernel the IP address and
port we want it to have. We do this with
bind
. For example we can
bind 127.0.0.1:8080
to our socket.
Finally we need to tell the kernel we'd like to accept incoming connections,
making it a server. We do this with
listen
. We can then
call accept
to accept
connections. There's a lot more to making a functional server, for which you can
check out Beej's guide.
Most languages will have higher-level libraries for doing all of this. For example we have Express for Node, where making a server is quite simple:
const express = require('express') const app = express() app.get('/', (req, res) => { res.send('Hello World!') }) app.listen(8080, "127.0.0.1", () => {})
That handy listen
function does all of the work for us.
Python's FastAPI has a very similar example:
from fastapi import FastAPI import uvicorn app = FastAPI() @app.get("/") def hello(): return "Hello World!" uvicorn.run(app, host="127.0.0.1", port=8080)
Who can talk to it?
Should connections from outside the host be possible?
If the answer is no, the outside world should not be able to access the
server, you should bind a loopback address such as 127.0.0.1
. This can also
generally be referred to as localhost
. Be aware that localhost
may refer to
the IPv6 equivalent, ::1
, which can cause issues if you bind to localhost
but try to connect to 127.0.0.1
. There are many such loopback IP addresses
available. In general people stick to 127.0.0.1
and bind to different ports if
they wish to have multiple servers running. You can bind to 127.0.0.2:8080
and
127.0.0.3:8080
to no ill effect other than localhost
no longer referring to
them.
If the answer is yes, the outside world should be able to access the server,
you'll generally want to bind to the special address 0.0.0.0
. As we will see
later, the kernel will send on any connection from any network interface as long
as everything else matches, such as the port.
More complicated networking set-ups can require more nuance, where you may want to bind to a given network card's IP address explicitly.
Servers in Docker containers
Docker requires some special attention here. Since a Docker container operates
in its own network namespace, its localhost is different to the localhost of the
Docker host. Even if you bind to 0.0.0.0:8080
inside a container, is it
still not available outside the host, since it will only apply to the network
interfaces inside the container.
However, Docker will set up a network interface on the host and on containers to
link the two worlds. The ip
command can be used to see various information
about network interfaces in Linux. If we use ip route
both on the host and in
a container we get the following:
# # On the host # host% ip route default via 192.168.1.254 dev enp2s0 proto dhcp src 192.168.1.190 metric 100 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 (trimmed output) # # In the container # container% ip route default via 172.17.0.1 dev eth0 172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.2
In the host's routes, we can see one for docker0
. This is the network
interface Docker created on our host to talk to containers. It tells us that the
addresses 172.17.0.0/16
should be sent to this interface. In the container's
routes we can see the eth0
interface. It also tells us our container's IP
address for this interface is 172.17.0.2
.
If we bind to 0.0.0.0:8080
in our container, or even bind to 172.17.0.2:8080
directly, then we can access our server from the host:
host% curl 172.17.0.2:8080 Hello World!
But if we bind to a loopback address like 127.0.0.1
then we would not be able
to do this cURL request successfully, since the server would only be bound to
the Docker containers loopback interface.
This doesn't provide us a way to expose the server running in a Docker container
to the world outside of the host, since to do that our server would need to be
bound to 0.0.0.0
on the host or one of the host's specific addresses.
Docker provides us a way to connect these through the -p
or --publish
flags,
such as
docker run -p 127.0.0.1:80:8080 nginx:alpine
When trying to parse these arguments it's good to remember the host is on the
left of the colon. In 127.0.0.1:80:8080
the 127.0.0.1:80
is referring to the
host's network and 8080
is referring to the containers network.
This command causes Docker to bind to the host's 127.0.0.1:80
and forward any
packets to/from the container's 8080
port. Inside the container these packets
will be via the eth0
interface we saw earlier. Because they're on eth0
, they
still won't be seen/sent by a server bound to localhost inside the container.
The container's server would need to be bound to 0.0.0.0
for this forwarding
to/from the host to work.
The shorthand version of the flag, -p 80:8080
will bind to 0.0.0.0
on the
host, and will therefore expose the container's server to outside the host. This
can easily be an accidental security
hole.
Hopefully you have some firewall external to your host anyway, but either way it
would be good practice to only bind to localhost if that is sufficient. You can
also be explicit with -p 0.0.0.0:80:8080
.
How does 0.0.0.0
work?
At this point, I'm going to start digging up Linux source code to show how this
works. If you're happy that you understand the behaviour of 0.0.0.0
and that's
all you need, this section isn't going to add much for you. But if you want a
better idea of how this occurs in Linux read on.
Our sockets, funnily enough, correspond to a struct socket
in the kernel. If
you're not familiar with C, this is a bit like a class:
/** * struct socket - general BSD socket * @state: socket state (%SS_CONNECTED, etc) * @type: socket type (%SOCK_STREAM, etc) * @flags: socket flags (%SOCK_NOSPACE, etc) * @ops: protocol specific socket operations * @file: File back pointer for gc * @sk: internal networking protocol agnostic socket representation * @wq: wait queue for several uses */ struct socket { socket_state state; short type; unsigned long flags; struct file *file; struct sock *sk; const struct proto_ops *ops; /* Might change with IPV6_ADDRFORM or MPTCP. */ struct socket_wq wq; };
C doesn't have classes, but it can mimic them manually by storing function
pointers to act as the 'methods' on that class. You can see that, in the struct proto_ops
field. This stores a bunch of function pointers for things like
bind
for that particular protocol.
For IPv4 sockets, this ends up calling the __inet_bind
function in
af_inet.c
This sets up a bunch of stuff for the socket object, but not too much relevant
to us now. The listen
syscall ends up calling inet_csk_listen_start
:
int inet_csk_listen_start(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); struct inet_sock *inet = inet_sk(sk); int err; err = inet_ulp_can_listen(sk); if (unlikely(err)) return err; reqsk_queue_alloc(&icsk->icsk_accept_queue); sk->sk_ack_backlog = 0; inet_csk_delack_init(sk); /* There is race window here: we announce ourselves listening, * but this transition is still not validated by get_port(). * It is OK, because this socket enters to hash table only * after validation is complete. */ inet_sk_state_store(sk, TCP_LISTEN); err = sk->sk_prot->get_port(sk, inet->inet_num); if (!err) { inet->inet_sport = htons(inet->inet_num); sk_dst_reset(sk); err = sk->sk_prot->hash(sk); if (likely(!err)) return 0; } inet_sk_set_state(sk, TCP_CLOSE); return err; }
The important line here is err = sk->sk_prot->hash(sk)
. This hash
function
takes aspects of our bound socket and stores it in a global kernel data
structure for fast look up later (a hash table). It actually does the storing
rather than returning a hash of the socket. For IPv4 the hash function is
ipv4_portaddr_hash
:
static inline u32 ipv4_portaddr_hash( const struct net *net, __be32 saddr, unsigned int port) { return jhash_1word((__force u32)saddr, net_hash_mix(net)) ^ port; }
You can see that the hash mostly uses the IP address saddr
and the port to
produce the hash. This means the kernel can look up our listening socket from
just this information. When the kernel receives a packet, the destination IP
and destination port can be used to look up our socket.
When the kernel receives a packet, it uses __inet_lookup_listener
to look up
the listener socket:
struct sock *__inet_lookup_listener(const struct net *net, struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, const __be32 saddr, __be16 sport, const __be32 daddr, const unsigned short hnum, const int dif, const int sdif) { struct inet_listen_hashbucket *ilb2; struct sock *result = NULL; unsigned int hash2; /* Lookup redirect from BPF */ if (static_branch_unlikely(&bpf_sk_lookup_enabled) && hashinfo == net->ipv4.tcp_death_row.hashinfo) { result = inet_lookup_run_sk_lookup(net, IPPROTO_TCP, skb, doff, saddr, sport, daddr, hnum, dif, inet_ehashfn); if (result) goto done; } hash2 = ipv4_portaddr_hash(net, daddr, hnum); ilb2 = inet_lhash2_bucket(hashinfo, hash2); result = inet_lhash2_lookup(net, ilb2, skb, doff, saddr, sport, daddr, hnum, dif, sdif); if (result) goto done; /* Lookup lhash2 with INADDR_ANY */ hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum); ilb2 = inet_lhash2_bucket(hashinfo, hash2); result = inet_lhash2_lookup(net, ilb2, skb, doff, saddr, sport, htonl(INADDR_ANY), hnum, dif, sdif); done: if (IS_ERR(result)) return NULL; return result; }
We can see the use of the very same hash function ipv4_portaddr_hash
, which is
then looked up in two stages by inet_lhash2_bucket
and inet_lhash2_lookup
.
You can think of this as just a hash table look up.
This hashing and lookup happens up to twice. The first time the destination
address and port is used in the look up. This means that if we bound our server
to an explicit address such as 127.0.0.1:8080
then this first look up will
succeed if this is the packet's destination.
But this first lookup would fail if we bound our server to 0.0.0.0:8080
since
the hash would not be the same as the destination address. A second lookup is
performed, this time with the address portion set to INADDR_ANY
, which is
constant referring to 0.0.0.0
, and the destination port. This time, our
server bound to 0.0.0.0:8080
will be found, and the socket will receive the
packet assuming all else is good.
This captures a bunch of the core logic for actually deciding a packet arrives on a given socket. There isn't much reference to network interfaces here. The lookup function does consider network interfaces and scores candidate sockets with them in mind.
Conclusion
This covers a bunch of information about network interfaces and servers that I think is helpful for a software developer. When I set out to write this my understanding was very implicit. I understood the behaviour to some extent, but had no idea of the actual mechanisms.
Researching this made these concepts more concrete and I learned a lot along the way. There are bits I did not include, such as exactly what circumstances need to be met for the kernel to allow you to bind to a given address, or that DHCP ultimately causes your device to learn about it's IP address on the local network and automatically configures that network interface.
I hope you found it useful.
Footnotes
-
Not without further shenanigans such as port forwarding at least. ↩