Re: [RFC] net: add new socket option SO_SETNETNS

From: Eric Dumazet
Date: Thu Feb 02 2023 - 15:10:46 EST


On Thu, Feb 2, 2023 at 8:55 PM Alok Tiagi <aloktiagi@xxxxxxxxx> wrote:
>
> On Thu, Feb 02, 2023 at 09:48:10AM +0800, Hillf Danton wrote:
> > On Wed, 1 Feb 2023 19:22:57 +0000 aloktiagi <aloktiagi@xxxxxxxxx>
> > > @@ -1535,6 +1535,52 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
> > > WRITE_ONCE(sk->sk_txrehash, (u8)val);
> > > break;
> > >
> > > + case SO_SETNETNS:
> > > + {
> > > + struct net *other_ns, *my_ns;
> > > +
> > > + if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6) {
> > > + ret = -EOPNOTSUPP;
> > > + break;
> > > + }
> > > +
> > > + if (sk->sk_type != SOCK_STREAM && sk->sk_type != SOCK_DGRAM) {
> > > + ret = -EOPNOTSUPP;
> > > + break;
> > > + }
> > > +
> > > + other_ns = get_net_ns_by_fd(val);
> > > + if (IS_ERR(other_ns)) {
> > > + ret = PTR_ERR(other_ns);
> > > + break;
> > > + }
> > > +
> > > + if (!ns_capable(other_ns->user_ns, CAP_NET_ADMIN)) {
> > > + ret = -EPERM;
> > > + goto out_err;
> > > + }
> > > +
> > > + /* check that the socket has never been connected or recently disconnected */
> > > + if (sk->sk_state != TCP_CLOSE || sk->sk_shutdown & SHUTDOWN_MASK) {
> > > + ret = -EOPNOTSUPP;
> > > + goto out_err;
> > > + }
> > > +
> > > + /* check that the socket is not bound to an interface*/
> > > + if (sk->sk_bound_dev_if != 0) {
> > > + ret = -EOPNOTSUPP;
> > > + goto out_err;
> > > + }
> > > +
> > > + my_ns = sock_net(sk);
> > > + sock_net_set(sk, other_ns);
> > > + put_net(my_ns);
> > > + break;
> >
> > cpu 0 cpu 2
> > --- ---
> > ns = sock_net(sk);
> > my_ns = sock_net(sk);
> > sock_net_set(sk, other_ns);
> > put_net(my_ns);
> > ns is invalid ?
>
> That is the reason we want the socket to be in an un-connected state. That
> should help us avoid this situation.

This is not enough....

Another thread might look at sock_net(sk), for example from inet_diag
or tcp timers
(which can be fired even in un-connected state)

Even UDP sockets can receive packets while being un-connected,
and they need to deref the net pointer.

Currently there is no protection about sock_net(sk) being changed on the fly,
and the struct net could disappear and be freed.

There are ~1500 uses of sock_net(sk) in the kernel, I do not think
you/we want to audit all
of them to check what could go wrong...