Re: [syzbot] [net?] WARNING in mpls_gso_segment

From: Eric Dumazet
Date: Thu Feb 22 2024 - 03:14:52 EST


On Wed, Feb 21, 2024 at 2:15 PM Florian Westphal <fw@xxxxxxxxx> wrote:
>
> syzbot <syzbot+99d15fcdb0132a1e1a82@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1536462c180000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/adbf5d8e38d7/disk-49344462.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/0f8e3fb78410/vmlinux-49344462.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/682f4814bf23/bzImage-49344462.xz
> >
> > The issue was bisected to:
> >
> > commit 219eee9c0d16f1b754a8b85275854ab17df0850a
> > Author: Florian Westphal <fw@xxxxxxxxx>
> > Date: Fri Feb 16 11:36:57 2024 +0000
> >
> > net: skbuff: add overflow debug check to pull/push helpers
> >
> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13262752180000
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=10a62752180000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=17262752180000
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+99d15fcdb0132a1e1a82@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Fixes: 219eee9c0d16 ("net: skbuff: add overflow debug check to pull/push helpers")
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 pskb_may_pull_reason include/linux/skbuff.h:2723 [inline]
> > WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 pskb_may_pull include/linux/skbuff.h:2739 [inline]
> > WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 mpls_gso_segment+0x773/0xaa0 net/mpls/mpls_gso.c:34
>
> Two possible solutions:
>
> 1.)
>
> diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
> index 533d082f0701..43801b78dd64 100644
> --- a/net/mpls/mpls_gso.c
> +++ b/net/mpls/mpls_gso.c
> @@ -25,12 +25,13 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
> netdev_features_t mpls_features;
> u16 mac_len = skb->mac_len;
> __be16 mpls_protocol;
> - unsigned int mpls_hlen;
> + int mpls_hlen;
>
> skb_reset_network_header(skb);
> mpls_hlen = skb_inner_network_header(skb) - skb_network_header(skb);
> - if (unlikely(!mpls_hlen || mpls_hlen % MPLS_HLEN))
> + if (unlikely(mpls_hlen <= 0 || mpls_hlen % MPLS_HLEN))
> goto out;
> +
> if (unlikely(!pskb_may_pull(skb, mpls_hlen)))
> goto out;

I guess we should try this, or perhaps understand why
skb->encapsulation might not be set,
or why skb_inner_network_header(skb) is not set at this point.

>
> (or a variation thereof).
>
> 2) revert the pskb_may_pull_reason change added in 219eee9c0d16f1b754a8 to
> make it tolerant to "negative" (huge) may-pull requests again.
>
> With above repro, skb_inner_network_header() yields 0, skb_network_header()
> returns 108, so we "pskb_may_pull(skb, -108)))" which now triggers
> DEBUG_NET_WARN_ON_ONCE() check.
>
> Before blamed commit, this would make pskb_may_pull hit:
>
> if (unlikely(len > skb->len))
> return SKB_DROP_REASON_PKT_TOO_SMALL;
>
> and mpls_gso_segment takes the 'goto out' label.
>
> So question is really if we should fix this in mpls_gso (and possible others
> that try to pull negative numbers...) or if we should legalize this, either by
> adding explicit if (unlikely(len > INT_MAX)) test to pskb_may_pull_reason or
> by adding a comment that negative 'len' numbers are expected to be caught by
> the check vs. skb->len.
>
> Opinions?

Lets live without 2) for a while, try to fix callers ?