Re: [RFC bpf-next] xsk: honor SO_BINDTODEVICE on bind

From: Magnus Karlsson
Date: Mon Jul 03 2023 - 05:49:02 EST


On Fri, 30 Jun 2023 at 16:58, Ilya Maximets <i.maximets@xxxxxxx> wrote:
>
> Initial creation of an AF_XDP socket requires CAP_NET_RAW capability.
> A privileged process might create the socket and pass it to a
> non-privileged process for later use. However, that process will be
> able to bind the socket to any network interface. Even though it will
> not be able to receive any traffic without modification of the BPF map,
> the situation is not ideal.
>
> Sockets already have a mechanism that can be used to restrict what
> interface they can be attached to. That is SO_BINDTODEVICE.
>
> To change the binding the process will need CAP_NET_RAW.
>
> Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer
> workflow when non-privileged process is using AF_XDP.

Rebinding an AF_XDP socket is not allowed today. Any such attempt will
return an error from bind. So if I understand the purpose of
SO_BINDTODEVICE correctly, you could say that this option is always
set for an AF_XDP socket and it is not possible to toggle it. The only
way to "rebind" an AF_XDP socket is to close it and open a new one.
This was a conscious design decision from day one as it would be very
hard to support this, especially in zero-copy mode.

> Signed-off-by: Ilya Maximets <i.maximets@xxxxxxx>
> ---
>
> Posting as an RFC for now to probably get some feedback.
> Will re-post once the tree is open.
>
> Documentation/networking/af_xdp.rst | 9 +++++++++
> net/xdp/xsk.c | 6 ++++++
> 2 files changed, 15 insertions(+)
>
> diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
> index 247c6c4127e9..1cc35de336a4 100644
> --- a/Documentation/networking/af_xdp.rst
> +++ b/Documentation/networking/af_xdp.rst
> @@ -433,6 +433,15 @@ start N bytes into the buffer leaving the first N bytes for the
> application to use. The final option is the flags field, but it will
> be dealt with in separate sections for each UMEM flag.
>
> +SO_BINDTODEVICE setsockopt
> +--------------------------
> +
> +This is a generic SOL_SOCKET option that can be used to tie AF_XDP
> +socket to a particular network interface. It is useful when a socket
> +is created by a privileged process and passed to a non-privileged one.
> +Once the option is set, kernel will refuse attempts to bind that socket
> +to a different interface. Updating the value requires CAP_NET_RAW.
> +
> XDP_STATISTICS getsockopt
> -------------------------
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 5a8c0dd250af..386ff641db0f 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -886,6 +886,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> struct sock *sk = sock->sk;
> struct xdp_sock *xs = xdp_sk(sk);
> struct net_device *dev;
> + int bound_dev_if;
> u32 flags, qid;
> int err = 0;
>
> @@ -899,6 +900,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
> XDP_USE_NEED_WAKEUP))
> return -EINVAL;
>
> + bound_dev_if = READ_ONCE(sk->sk_bound_dev_if);
> +
> + if (bound_dev_if && bound_dev_if != sxdp->sxdp_ifindex)
> + return -EINVAL;
> +
> rtnl_lock();
> mutex_lock(&xs->mutex);
> if (xs->state != XSK_READY) {
> --
> 2.40.1
>
>