Re: [RFC] net: add new socket option SO_SETNETNS

From: Alok Tiagi
Date: Thu Feb 02 2023 - 18:59:06 EST


On Thu, Feb 02, 2023 at 09:10:23PM +0100, Eric Dumazet wrote:
> On Thu, Feb 2, 2023 at 8:55 PM Alok Tiagi <aloktiagi@xxxxxxxxx> wrote:
> >
> > On Thu, Feb 02, 2023 at 09:48:10AM +0800, Hillf Danton wrote:
> > > On Wed, 1 Feb 2023 19:22:57 +0000 aloktiagi <aloktiagi@xxxxxxxxx>
> > > > @@ -1535,6 +1535,52 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
> > > > WRITE_ONCE(sk->sk_txrehash, (u8)val);
> > > > break;
> > > >
> > > > + case SO_SETNETNS:
> > > > + {
> > > > + struct net *other_ns, *my_ns;
> > > > +
> > > > + if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6) {
> > > > + ret = -EOPNOTSUPP;
> > > > + break;
> > > > + }
> > > > +
> > > > + if (sk->sk_type != SOCK_STREAM && sk->sk_type != SOCK_DGRAM) {
> > > > + ret = -EOPNOTSUPP;
> > > > + break;
> > > > + }
> > > > +
> > > > + other_ns = get_net_ns_by_fd(val);
> > > > + if (IS_ERR(other_ns)) {
> > > > + ret = PTR_ERR(other_ns);
> > > > + break;
> > > > + }
> > > > +
> > > > + if (!ns_capable(other_ns->user_ns, CAP_NET_ADMIN)) {
> > > > + ret = -EPERM;
> > > > + goto out_err;
> > > > + }
> > > > +
> > > > + /* check that the socket has never been connected or recently disconnected */
> > > > + if (sk->sk_state != TCP_CLOSE || sk->sk_shutdown & SHUTDOWN_MASK) {
> > > > + ret = -EOPNOTSUPP;
> > > > + goto out_err;
> > > > + }
> > > > +
> > > > + /* check that the socket is not bound to an interface*/
> > > > + if (sk->sk_bound_dev_if != 0) {
> > > > + ret = -EOPNOTSUPP;
> > > > + goto out_err;
> > > > + }
> > > > +
> > > > + my_ns = sock_net(sk);
> > > > + sock_net_set(sk, other_ns);
> > > > + put_net(my_ns);
> > > > + break;
> > >
> > > cpu 0 cpu 2
> > > --- ---
> > > ns = sock_net(sk);
> > > my_ns = sock_net(sk);
> > > sock_net_set(sk, other_ns);
> > > put_net(my_ns);
> > > ns is invalid ?
> >
> > That is the reason we want the socket to be in an un-connected state. That
> > should help us avoid this situation.
>
> This is not enough....
>
> Another thread might look at sock_net(sk), for example from inet_diag
> or tcp timers
> (which can be fired even in un-connected state)
>
> Even UDP sockets can receive packets while being un-connected,
> and they need to deref the net pointer.
>
> Currently there is no protection about sock_net(sk) being changed on the fly,
> and the struct net could disappear and be freed.
>
> There are ~1500 uses of sock_net(sk) in the kernel, I do not think
> you/we want to audit all
> of them to check what could go wrong...

I agree, auditing all the uses of sock_net(sk) is not a feasible option. From my
exploration of the usage of sock_net(sk) it appeared that it might be safe to
swap a sockets net ns if it had never been connected but I looked at only a
subset of such uses.

Introducing a ref counting logic to every access of sock_net(sk) may help get
around this but invovles a bigger change to increment and decrement the count at
every use of sock_net().

Any suggestions if this could be achieved in another way much close to the
socket creation time or any comments on our workaround for injecting sockets using
seccomp addfd?