[PATCH net-next] netns: filter uevents correctly

From: Christian Brauner
Date: Wed Apr 04 2018 - 15:49:36 EST


commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")

enabled sending hotplug events into all network namespaces back in 2010.
Over time the set of uevents that get sent into all network namespaces has
shrunk. We have now reached the point where hotplug events for all devices
that carry a namespace tag are filtered according to that namespace.

Specifically, they are filtered whenever the namespace tag of the kobject
does not match the namespace tag of the netlink socket. One example are
network devices. Uevents for network devices only show up in the network
namespaces these devices are moved to or created in.

However, any uevent for a kobject that does not have a namespace tag
associated with it will not be filtered and we will *try* to broadcast it
into all network namespaces.

The original patchset was written in 2010 before user namespaces were a
thing. With the introduction of user namespaces sending out uevents became
partially isolated as they were filtered by user namespaces:

net/netlink/af_netlink.c:do_one_broadcast()

if (!net_eq(sock_net(sk), p->net)) {
if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
return;

if (!peernet_has_id(sock_net(sk), p->net))
return;

if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
CAP_NET_BROADCAST))
j return;
}

The file_ns_capable() check will check whether the caller had
CAP_NET_BROADCAST at the time of opening the netlink socket in the user
namespace of interest. This check is fine in general but seems insufficient
to me when paired with uevents. The reason is that devices always belong to
the initial user namespace so uevents for kobjects that do not carry a
namespace tag should never be sent into another user namespace. This has
been the intention all along. But there's one case where this breaks,
namely if a new user namespace is created by root on the host and an
identity mapping is established between root on the host and root in the
new user namespace. Here's a reproducer:

sudo unshare -U --map-root
udevadm monitor -k
# Now change to initial user namespace and e.g. do
modprobe kvm
# or
rmmod kvm

will allow the non-initial user namespace to retrieve all uevents from the
host. This seems very anecdotal given that in the general case user
namespaces do not see any uevents and also can't really do anything useful
with them.

Additionally, it is now possible to send uevents from userspace. As such we
can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
namespace of the network namespace of the netlink socket) userspace process
make a decision what uevents should be sent.

This makes me think that we should simply ensure that uevents for kobjects
that do not carry a namespace tag are *always* filtered by user namespace
in kobj_bcast_filter(). Specifically:
- If the owning user namespace of the uevent socket is not init_user_ns the
event will always be filtered.
- If the network namespace the uevent socket belongs to was created in the
initial user namespace but was opened from a non-initial user namespace
the event will be filtered as well.
Put another way, uevents for kobjects not carrying a namespace tag are now
always only sent to the initial user namespace. The regression potential
for this is near to non-existent since user namespaces can't really do
anything with interesting devices.

Signed-off-by: Christian Brauner <christian.brauner@xxxxxxxxxx>
---
lib/kobject_uevent.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 15ea216a67ce..cb98cddb6e3b 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
return sock_ns != ns;
}

- return 0;
+ /*
+ * The kobject does not carry a namespace tag so filter by user
+ * namespace below.
+ */
+ if (sock_net(dsk)->user_ns != &init_user_ns)
+ return 1;
+
+ /* Check if socket was opened from non-initial user namespace. */
+ return sk_user_ns(dsk) != &init_user_ns;
}
#endif

--
2.15.1