| |
Subscribe / Log in / New account

Namespaces in operation, part 7: Network namespaces

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Jake Edge
January 22, 2014
Namespaces in operation

It's been a while since last we looked at Linux namespaces. Our series has been missing a piece that we are finally filling in: network namespaces. As the name would imply, network namespaces partition the use of the network—devices, addresses, ports, routes, firewall rules, etc.—into separate boxes, essentially virtualizing the network within a single running kernel instance. Network namespaces entered the kernel in 2.6.24, almost exactly five years ago; it took something approaching a year before they were ready for prime time. Since then, they seem to have been largely ignored by many developers.

Basic network namespace management

As with the others, network namespaces are created by passing a flag to the clone() system call: CLONE_NEWNET. From the command line, though, it is convenient to use the ip networking configuration tool to set up and work with network namespaces. For example:

    # ip netns add netns1

This command creates a new network namespace called netns1. When the ip tool creates a network namespace, it will create a bind mount for it under /var/run/netns; that allows the namespace to persist even when no processes are running within it and facilitates the manipulation of the namespace itself. Since network namespaces typically require a fair amount of configuration before they are ready for use, this feature will be appreciated by system administrators.

The "ip netns exec" command can be used to run network management commands within the namespace:

    # ip netns exec netns1 ip link list
    1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT 
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

This command lists the interfaces visible inside the namespace. A network namespace can be removed with:

    # ip netns delete netns1

This command removes the bind mount referring to the given network namespace. The namespace itself, however, will persist for as long as any processes are running within it.

Network namespace configuration

New network namespaces will have a loopback device but no other network devices. Aside from the loopback device, each network device (physical or virtual interfaces, bridges, etc.) can only be present in a single network namespace. In addition, physical devices (those connected to real hardware) cannot be assigned to namespaces other than the root. Instead, virtual network devices (e.g. virtual ethernet or veth) can be created and assigned to a namespace. These virtual devices allow processes inside the namespace to communicate over the network; it is the configuration, routing, and so on that determine who they can communicate with.

When first created, the lo loopback device in the new namespace is down, so even a loopback ping will fail:

    # ip netns exec netns1 ping 127.0.0.1
    connect: Network is unreachable
Bringing that interface up will allow pinging the loopback address:
    # ip netns exec netns1 ip link set dev lo up
    # ip netns exec netns1 ping 127.0.0.1
    PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
    64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.051 ms
    ...
But that still doesn't allow communication between netns1 and the root namespace. To do that, virtual ethernet devices need to be created and configured:
    # ip link add veth0 type veth peer name veth1
    # ip link set veth1 netns netns1
The first command sets up a pair of virtual ethernet devices that are connected. Packets sent to veth0 will be received by veth1 and vice versa. The second command assigns veth1 to the netns1 namespace.
    # ip netns exec netns1 ifconfig veth1 10.1.1.1/24 up
    # ifconfig veth0 10.1.1.2/24 up
Then, these two commands set IP addresses for the two devices.
    # ping 10.1.1.1
    PING 10.1.1.1 (10.1.1.1) 56(84) bytes of data.
    64 bytes from 10.1.1.1: icmp_seq=1 ttl=64 time=0.087 ms
    ...
    
    # ip netns exec netns1 ping 10.1.1.2
    PING 10.1.1.2 (10.1.1.2) 56(84) bytes of data.
    64 bytes from 10.1.1.2: icmp_seq=1 ttl=64 time=0.054 ms
    ...
Communication in both directions is now possible as the ping commands above show.

As mentioned, though, namespaces do not share routing tables or firewall rules, as running route and iptables -L in netns1 will attest.

    # ip netns exec netns1 route
    # ip netns exec netns1 iptables -L
The first will simply show a route for packets to the 10.1.1 subnet (using veth1), while the second shows no iptables configured. All of that means that packets sent from netns1 to the internet at large will get the dreaded "Network is unreachable" message. There are several ways to connect the namespace to the internet if that is desired. A bridge can be created in the root namespace and the veth device from netns1. Alternatively, IP forwarding coupled with network address translation (NAT) could be configured in the root namespace. Either of those (and there are other configuration possibilities) will allow packets from netns1 to reach the internet and for replies to be received in netns1.

Non-root processes that are assigned to a namespace (via clone(), unshare(), or setns()) only have access to the networking devices and configuration that have been set up in that namespace—root can add new devices and configure them, of course. Using the ip netns sub-command, there are two ways to address a network namespace: by its name, like netns1, or by the process ID of a process in that namespace. Since init generally lives in the root namespace, one could use a command like:

    # ip link set vethX netns 1
That would put a (presumably newly created) veth device into the root namespace and it would work for a root user from any other namespace. In situations where it is not desirable to allow root to perform such operations from within a network namespace, the PID and mount namespace features can be used to make the other network namespaces unreachable.

Uses for network namespaces

As we have seen, a namespace's networking can range from none at all (or just loopback) to full access to the system's networking capabilities. That leads to a number of different use cases for network namespaces.

By essentially turning off the network inside a namespace, administrators can ensure that processes running there will be unable to make connections outside of the namespace. Even if a process is compromised through some kind of security vulnerability, it will be unable to perform actions like joining a botnet or sending spam.

Even processes that handle network traffic (a web server worker process or web browser rendering process for example) can be placed into a restricted namespace. Once a connection is established by or to the remote endpoint, the file descriptor for that connection could be handled by a child process that is placed in a new network namespace created by a clone() call. The child would inherit its parent's file descriptors, thus have access to the connected descriptor. Another possibility would be for the parent to send the connected file descriptor to a process in a restricted network namespace via a Unix socket. In either case, the lack of suitable network devices in the namespace would make it impossible for the child or worker process to make additional network connections.

Namespaces could also be used to test complicated or intricate networking configurations all on a single box. Running sensitive services in more locked-down, firewall-restricted namespace is another. Obviously, container implementations also use network namespaces to give each container its own view of the network, untrammeled by processes outside of the container. And so on.

Namespaces in general provide a way to partition system resources and to isolate groups of processes from each other's resources. Network namespaces are more of the same, but since networking is a sensitive area for security flaws, providing network isolation of various sorts is particularly valuable. Of course, using multiple namespace types together can provide even more isolation for both security and other needs.


Index entries for this article
KernelNamespaces


(Log in to post comments)

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 3:27 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Yay! Thanks for the walkthrough. I'll have to try out stuffing a tmux session into a network namespace with which to use VPN with instead of forcing all of my traffic over it.

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 4:43 UTC (Thu) by mikemol (guest, #83507) [Link] (1 responses)

Oh, for pete's sake. I've been wanting a way to do multiple virtual routers on the same machine for months...and this has been there the entire time? I've got an ugly mess of a source-specific routing ruled this would have cleaned up nicely...though I doubt Sanewall has support for namespaces at this time...

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 22:13 UTC (Thu) by ebiederm (subscriber, #35028) [Link]

Which is a large part of the point of ip netns exec. It allows you to run unmodified code in a network namespace.

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 8:30 UTC (Thu) by nicollet (subscriber, #37185) [Link] (3 responses)

Is there a way to do layer3 devices like openVZ venet devices ?

Cheers,

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 11:03 UTC (Thu) by nelljerram (subscriber, #12005) [Link] (1 responses)

I'm not sure I've fully understand the behaviour of a venet device (from http://openvz.org/Virtual_network_device). But I believe that a veth device effectively becomes layer 3 if you add an IP address to it and don't plug it into a bridge. Does that help at all?

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 14:34 UTC (Thu) by nicollet (subscriber, #37185) [Link]

Not really, from a performance point of view, venet devices seem better.

see: http://html5tv.rot13.org/media/lpc2009-Network_namespaces...

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 21:40 UTC (Thu) by ebiederm (subscriber, #35028) [Link]

There are macvlan devices which are similar in usage.

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 9:05 UTC (Thu) by mbizon (subscriber, #37138) [Link] (5 responses)

> In addition, physical devices (those connected to real hardware) cannot be assigned to namespaces other than the root

yes they can, unless you meant something else by "real hardware"

I can move eth0, my actual laptop ethernet device, to another namespace without problem.

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 13:25 UTC (Thu) by hmh (subscriber, #3838) [Link] (3 responses)

AFAIK, anything that shows up separately in "ip link" can be moved, except for the loopback. You can move physical devices and even individual VLANs (sub-interfaces) of a physical device to other network namespaces, independently.

Now, I haven't tried fun stuff like moving a tunnel to a namespace where the underlying device isn't present, etc. Or moving a physical device which has vlans spread over several namespaces. Or any other number of netns manipulations that would touch boundary conditions. It might work. It might go bonkers. It might drink your beer or do something unspeakable to your pet.

I recall netns didn't work well for the stuff "tc" interacts with (traffic policer/shaper/classifier) in an earlier 3.0 kernel, for example. Yeah, that was way back in 12/2011, so chances are someone fixed it already.

Network namespaces can be quite "interesting" in the ancient chinese curses/proverbs sense: the thing has more boundary conditions than the number of D&D D20 dice you'd find in one of the country-wide roleplaying game conventions during the gold age of tabletop RPG gaming...

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 13:36 UTC (Thu) by mbizon (subscriber, #37138) [Link] (2 responses)

There is a flag on some netdevices (NETIF_F_NETNS_LOCAL) that prevent them from being moved.

It is present on special devices like lo, but also on some virtual/tunnel devices like ppp.

The rational for the latter is that packets sent on these interfaces may cross namespace boundaries, and that requires special handling (reset skb state). So until the codepath has been audited/fixed the flag is set.

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 15:03 UTC (Thu) by johill (subscriber, #25196) [Link]

FWIW, we also set it on wireless interfaces and only allow moving all virtual interfaces for the same hardware together, otherwise we'd get into very strange cross-netns issues

Namespaces in operation, part 7: Network namespaces

Posted Dec 2, 2015 13:38 UTC (Wed) by sourcejedi (guest, #45153) [Link]

I don't see why F_NETNS_LOCAL would be set on ppp, and I don't think it is. In fact it doesn't seem to be set on tunnel interfaces generally. (Unless you enable "metadata collection", whatever that is). The code explicitly records the netns the tunnel was created in and uses it in the xmit routine. Which is great because that's probably what you want if you move a tunnel into a network namespace. I haven't tested it yet though :).

http://lxr.free-electrons.com/ident?i=NETIF_F_NETNS_LOCAL

Namespaces in operation, part 7: Network namespaces

Posted Jun 2, 2016 6:46 UTC (Thu) by ravi239 (guest, #109082) [Link]

I observed that on moving eth0 to another namespace, ip6tnl0 (tunnel) interface is also moved automatically along with eth0.
Can we restrict ip6tnl0 movement and allow only eth0 to be moved to another namespace ?

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 13:57 UTC (Thu) by zuki (subscriber, #41808) [Link] (7 responses)

they seem to have been largely ignored by many developers
FWIW, systemd has PrivateNetwork= since a long while and recently JoinsNamespaceOf=. The first one is fairly heavily used.

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 15:21 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (6 responses)

Hmm…would this work (all under a vpn-work.target directory):

- vpn-work-namespace.service with PrivateNetwork=true;
- vpn-work-setup@.service which is After=vpn-work-namespace.service and Before=vpn-work.service which migrates the %i device into the namespace created in the above service, creates a bridge and clones the routing tables;
- vpn-work.service JoinsNamespaceOf=vpn-work-namespace.service which starts up the VPN with the proper configuration file.

I can then start openvpn in a new network service with "systemctl start vpn-work.target". Also, how does /etc/resolv.conf work here?

The question I'm really interested in, assuming the above works: can my user have a unit file which JoinsNamespaceOf= to a system target (so that I can, as a user, start up apps in the VPN environment). Maybe system services would need a AllowUsersIntoNamespace=true option for this to work.

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 16:24 UTC (Thu) by zuki (subscriber, #41808) [Link] (5 responses)

> I can then start openvpn in a new network service with "systemctl start
> vpn-work.target".
I imagine that this should work...

> Also, how does /etc/resolv.conf work here?
Unfortunately resolv.conf is global. There are some plans to replace it with something dynamic that can return different results for different interfaces, but afaik, nothing's been done yet.

> The question I'm really interested in, assuming the above works: can my
> user have a unit file which JoinsNamespaceOf= to a system target
No, the user systemd instance is unprivileged and cannot manipulate any namespace stuff right now. Though it would be really nice to add this kind of functionality.

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 16:31 UTC (Thu) by mbizon (subscriber, #37138) [Link] (3 responses)

> Unfortunately resolv.conf is global. There are some plans to replace it with something dynamic that can return different results for different interfaces, but afaik, nothing's been done yet.

when you use "ip netns exec NS ...", the process is launched such as /etc/resolv.conf is a bind mount of /etc/netns/<NS>/resolv.conf

see the manual page of ip(8) for more details

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 16:34 UTC (Thu) by zuki (subscriber, #41808) [Link] (2 responses)

Good to know, thanks. Systemd should probably do something similar.

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 16:37 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link] (1 responses)

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 16:41 UTC (Thu) by zuki (subscriber, #41808) [Link]

"intend to" — yes. Done anything — unfortunately no. Tom Gundersen's commit that you linked to creates a very classic resolv.conf file. It is created in /run/... and has to be manually symlinked to in /etc/ to be seen by resolvers.

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 16:32 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> Unfortunately resolv.conf is global.

Yeah, this is what I feared. I guess I could just stick the VPN DNS servers at the bottom of the list and have them just take longer to resolve.

> No, the user systemd instance is unprivileged and cannot manipulate any namespace stuff right now. Though it would be really nice to add this kind of functionality.

I presume it works in tandem with PID 1 though. Maybe stuffing some the relevant namespace FDs over some sockets would work here (as allowed by the .service defining the namespace(s)). Maybe something like "ExposeNamespaceToUsers=uid:500;gid:1000;group:vpn,wheel" in the top-level service?

Namespaces in operation, part 7: Network namespaces

Posted Jan 23, 2014 19:25 UTC (Thu) by raven667 (subscriber, #5198) [Link]

I should point out that there is a common framework used in networking circles that uses network namespaces for testing called Mininet http://mininet.org/ which is a python application that can create complex topologies using OpenVSwitch and OpenFlow to connect them together.

Namespaces in operation, part 7: Network namespaces

Posted Jan 29, 2014 10:54 UTC (Wed) by amarao (guest, #87073) [Link]

I found rather funny behaviour of ip netns command. If tun interface (from openvpn) is assigned to network namespace, it loose it own IP.

I think this is a feature, but rather strange, because restoring IP on openvpn interface require some hacks and tricks in scripts (to get old IP, to reassign it back).

Namespaces in operation, part 7: Network namespaces

Posted Jan 30, 2014 11:39 UTC (Thu) by kevinm (guest, #69913) [Link] (1 responses)

Do network namespaces also partition AF_UNIX sockets?

Namespaces in operation, part 7: Network namespaces

Posted Jan 30, 2014 12:22 UTC (Thu) by johill (subscriber, #25196) [Link]

The code looks like it should:
static struct sock *__unix_find_socket_byname(struct net *net,
                                              struct sockaddr_un *sunname,
                                              int len, int type, unsigned int hash)
{
        struct sock *s;

        sk_for_each(s, &unix_socket_table[hash ^ type]) {
                struct unix_sock *u = unix_sk(s);

                if (!net_eq(sock_net(s), net))
                        continue;
[...]

Namespaces in operation, part 7: Network namespaces

Posted Jan 18, 2016 10:50 UTC (Mon) by axlmac (guest, #106395) [Link] (2 responses)

Hi all,

I saw that interface indexing is local to namespaces and when trying to understand a bit more and assigning interfaces created on a namespace to another one I saw that rtnetlink prevented me to do that because the interface I tried to assign had the same index of another interface already present in the target namespace

RTNETLINK answers: File exists

Is there a way to assign ranges to the indexes of interfaces (by block of 1000 for instance) within a certain namespaces so that if moved they don't overlap? This might also give info on which namespace the interface was created (given for granted that namespaces have an index)

The question came up into my mind while creating a veth interface pair where the interfaces are created in a namespace and then one of them is associated/moved to another one.

Thanks Alex

Namespaces in operation, part 7: Network namespaces

Posted Jan 18, 2016 11:50 UTC (Mon) by paulj (subscriber, #341) [Link] (1 responses)

Create them in the default namespace (or at least, always create in a common namespace) and then assign to where ever needed?

Namespaces in operation, part 7: Network namespaces

Posted Jan 18, 2016 19:55 UTC (Mon) by axlmac (guest, #106395) [Link]

Well that's a good idea if all the interfaces were create by a central authority and in the main namespace but if in a namespace you terminate many ppp connections you'll get dozens of pppX interfaces and if the one you have just created in the main one has the index of one of those ppp conns I think you may get the same error.
Also if some namespaces are managed by tenants (in term of OpenStack.org) they may have created already other interfaces without the Central Authority knows it

Namespaces in operation, part 7: Network namespaces

Posted Feb 3, 2016 1:09 UTC (Wed) by axlmac (guest, #106395) [Link] (1 responses)

When speaking about veth interfaces, is there a way to know which namespace the other end of a veth pair is in? How does the kernel know that? I've sought the Internet for this info for almost an hour with no luck. So far I can retrieve the index of the other end of a pair and nothing else ("ip -d link show dev vethX")

Thanks, Alex

Namespaces in operation, part 7: Network namespaces

Posted Feb 15, 2019 8:05 UTC (Fri) by mkerrisk (subscriber, #1978) [Link]

Namespaces in operation, part 7: Network namespaces

Posted Jan 11, 2023 16:09 UTC (Wed) by vinipsmaker (guest, #126735) [Link]

> Once a connection is established by or to the remote endpoint, the file descriptor for that connection could be handled by a child process that is placed in a new network namespace created by a clone() call. The child would inherit its parent's file descriptors, thus have access to the connected descriptor. Another possibility would be for the parent to send the connected file descriptor to a process in a restricted network namespace via a Unix socket.

I was writing some test code to build a sandbox for my application and I tried just this, but it doesn't work.

1. Unprivileged process creates a new process in a new user+network namespace.
2. Let's say the parent process is the supervisor and the child process is the guest.
3. The supervisor creates a TCP connection to www.google.com:443 and sends the connected socket to the guest through SCM_RIGHTS.
4. The guest successfully receives the file descriptor (it is not -1 and it is properly allocated on the process table).
5. Any operation on that socket from the guest will error with EBADF (for other types of file descriptors such as a pipe it'll work just fine).

I guess I'll just have to create proxy services that send an AF_UNIX socket (these sockets work fine) and perform all the operations on the guest's behalf.


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds