Re: [PATCH] [RFC] net: bpf: make __bpf_skb_max_len(skb) an skb-independent constant

From: Alexei Starovoitov
Date: Tue Apr 28 2020 - 13:54:03 EST


On Tue, Apr 21, 2020 at 01:36:08PM -0700, Maciej Åenczykowski wrote:
> > > This function is used from:
> > > bpf_skb_adjust_room
> > > __bpf_skb_change_tail
> > > __bpf_skb_change_head
> > >
> > > but in the case of forwarding we're likely calling these functions
> > > during receive processing on ingress and bpf_redirect()'ing at
> > > a later point in time to egress on another interface, thus these
> > > mtu checks are for the wrong device.
> >
> > Interesting. Without redirecting there should also be no reason
> > to do this check at ingress, right? So at ingress it's either
> > incorrect or unnecessary?
>
> Well, I guess there's technically a chance that you'd want to mutate
> the packet somehow during ingress pre-receive processing (without
> redirecting)...
> But yeah, I can't really think of a case where that would be
> increasing the size of the packet.
>
> Usually you'd be decapsulating at ingress and encapsulating at egress,
> or doing ingress rewrite & redirect to egress...
>
> (Also, note that relying on a sequence where at ingress you first call
> bpf_redirect(ifindex, EGRESS); then change the packet size, and then
> return TC_ACT_REDIRECT; thus being able to use the redirect ifindex
> for mtu checks in the packet mutation functions is potentially buggy,
> since there's no guarantee you won't call bpf_redirect again to change
> the ifinidex, or even return from the bpf program without returning
> TC_ACT_REDIRECT --- so while that could be *more* correct, it would
> still have holes...)

yeah. there is no good fix here, since target netdev is not known,
but dropping the check also doesn't seem right.
How about:
if (skb->dev) {
u32 header_len = skb->dev->hard_header_len;

if (!header_len)
header_len = ETH_HLEN;
return skb->dev->mtu + header_len;
} else {
return SKB_MAX_ALLOC;
}

the idea that l3 devices won't have l2 and here we will assume
that l2 can be added sooner or later.
It's not pretty either, but it will solve your wifi->eth use case?
While keeping basic sanity for other cases.