Re: [PATCH bpf-next v4 02/10] bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc

From: Daniel Xu
Date: Fri Dec 08 2023 - 19:08:03 EST


On Thu, Dec 07, 2023 at 01:21:11PM -0800, Eyal Birger wrote:
> On Mon, Dec 4, 2023 at 12:57 PM Daniel Xu <dxu@xxxxxxxxx> wrote:
> >
> > This commit adds an unstable kfunc helper to access internal xfrm_state
> > associated with an SA. This is intended to be used for the upcoming
> > IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other
> > words: for custom software RSS.
> >
> > That being said, the function that this kfunc wraps is fairly generic
> > and used for a lot of xfrm tasks. I'm sure people will find uses
> > elsewhere over time.
> >
> > Co-developed-by: Antony Antony <antony.antony@xxxxxxxxxxx>
> > Signed-off-by: Antony Antony <antony.antony@xxxxxxxxxxx>
> > Signed-off-by: Daniel Xu <dxu@xxxxxxxxx>
> > ---
> > include/net/xfrm.h | 9 ++++
> > net/xfrm/xfrm_bpf.c | 102 +++++++++++++++++++++++++++++++++++++++++
> > net/xfrm/xfrm_policy.c | 2 +
> > 3 files changed, 113 insertions(+)
> >
> > diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> > index c9bb0f892f55..1d107241b901 100644
> > --- a/include/net/xfrm.h
> > +++ b/include/net/xfrm.h
> > @@ -2190,4 +2190,13 @@ static inline int register_xfrm_interface_bpf(void)
> >
> > #endif
> >
> > +#if IS_ENABLED(CONFIG_DEBUG_INFO_BTF)
> > +int register_xfrm_state_bpf(void);
> > +#else
> > +static inline int register_xfrm_state_bpf(void)
> > +{
> > + return 0;
> > +}
> > +#endif
> > +
> > #endif /* _NET_XFRM_H */
> > diff --git a/net/xfrm/xfrm_bpf.c b/net/xfrm/xfrm_bpf.c
> > index 3d3018b87f96..3d6cac7345ca 100644
> > --- a/net/xfrm/xfrm_bpf.c
> > +++ b/net/xfrm/xfrm_bpf.c
> > @@ -6,9 +6,11 @@
> > */
> >
> > #include <linux/bpf.h>
> > +#include <linux/btf.h>
> > #include <linux/btf_ids.h>
> >
> > #include <net/dst_metadata.h>
> > +#include <net/xdp.h>
> > #include <net/xfrm.h>
> >
> > #if IS_BUILTIN(CONFIG_XFRM_INTERFACE) || \
> > @@ -112,3 +114,103 @@ int __init register_xfrm_interface_bpf(void)
> > }
> >
> > #endif /* xfrm interface */
> > +
> > +/* bpf_xfrm_state_opts - Options for XFRM state lookup helpers
> > + *
> > + * Members:
> > + * @error - Out parameter, set for any errors encountered
> > + * Values:
> > + * -EINVAL - netns_id is less than -1
> > + * -EINVAL - opts__sz isn't BPF_XFRM_STATE_OPTS_SZ
> > + * -ENONET - No network namespace found for netns_id
> > + * @netns_id - Specify the network namespace for lookup
> > + * Values:
> > + * BPF_F_CURRENT_NETNS (-1)
> > + * Use namespace associated with ctx
> > + * [0, S32_MAX]
> > + * Network Namespace ID
> > + * @mark - XFRM mark to match on
> > + * @daddr - Destination address to match on
> > + * @spi - Security parameter index to match on
> > + * @proto - L3 protocol to match on
> > + * @family - L3 protocol family to match on
> > + */
> > +struct bpf_xfrm_state_opts {
> > + s32 error;
> > + s32 netns_id;
> > + u32 mark;
> > + xfrm_address_t daddr;
> > + __be32 spi;
> > + u8 proto;
> > + u16 family;
> > +};
> > +
> > +enum {
> > + BPF_XFRM_STATE_OPTS_SZ = sizeof(struct bpf_xfrm_state_opts),
> > +};
> > +
> > +__bpf_kfunc_start_defs();
> > +
> > +/* bpf_xdp_get_xfrm_state - Get XFRM state
> > + *
> > + * Parameters:
> > + * @ctx - Pointer to ctx (xdp_md) in XDP program
> > + * Cannot be NULL
> > + * @opts - Options for lookup (documented above)
> > + * Cannot be NULL
> > + * @opts__sz - Length of the bpf_xfrm_state_opts structure
> > + * Must be BPF_XFRM_STATE_OPTS_SZ
> > + */
> > +__bpf_kfunc struct xfrm_state *
> > +bpf_xdp_get_xfrm_state(struct xdp_md *ctx, struct bpf_xfrm_state_opts *opts, u32 opts__sz)
> > +{
> > + struct xdp_buff *xdp = (struct xdp_buff *)ctx;
> > + struct net *net = dev_net(xdp->rxq->dev);
> > + struct xfrm_state *x;
> > +
> > + if (!opts || opts__sz < sizeof(opts->error))
> > + return NULL;
> > +
> > + if (opts__sz != BPF_XFRM_STATE_OPTS_SZ) {
> > + opts->error = -EINVAL;
> > + return NULL;
> > + }
> > +
> > + if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) {
> > + opts->error = -EINVAL;
> > + return NULL;
> > + }
> > +
> > + if (opts->netns_id >= 0) {
> > + net = get_net_ns_by_id(net, opts->netns_id);
> > + if (unlikely(!net)) {
> > + opts->error = -ENONET;
> > + return NULL;
> > + }
> > + }
> > +
> > + x = xfrm_state_lookup(net, opts->mark, &opts->daddr, opts->spi,
> > + opts->proto, opts->family);
> > +
> > + if (opts->netns_id >= 0)
> > + put_net(net);
>
> Maybe opts->error should be set to something like -ENOENT if x == NULL?

Originally I opted not to do that b/c xfrm_state_lookup() chooses not to
do anything like that (eg PTR_ERR()).

But I don't mind adding it - I think it's reasonable either way.

[..]

Thanks,
Daniel