Re: [Intel-wired-lan] bug with rx-udp-gro-forwarding offloading?

From: Ian Kumlien
Date: Wed Jun 28 2023 - 04:22:02 EST


Been running all night but eventually it crashed again...

[21753.055795] Out of memory: Killed process 970 (qemu-system-x86)
total-vm:4709488kB, anon-rss:2172652kB, file-rss:4608kB,
shmem-rss:0kB, UID:77 pgtables:4800kB oom_score_adj:0
[24249.061154] general protection fault, probably for non-canonical
address 0xb0746d4e6bee35e2: 0000 [#1] PREEMPT SMP NOPTI
[24249.072138] CPU: 0 PID: 893 Comm: napi/eno1-68 Tainted: G W
6.4.0-dirty #366
[24249.080670] Hardware name: Supermicro Super Server/A2SDi-12C-HLN4F,
BIOS 1.7a 10/13/2022
[24249.088852] RIP: 0010:kmem_cache_alloc_bulk (mm/slub.c:377
mm/slub.c:388 mm/slub.c:395 mm/slub.c:3963 mm/slub.c:4026)
[24249.094086] Code: 0f 84 46 ff ff ff 65 ff 05 a4 bd e4 47 48 8b 4d
00 65 48 03 0d e8 5f e3 47 9c 5e fa 45 31 d2 eb 2f 8b 45 28 48 01 d0
48 89 c7 <48> 8b 00 48 33 85 b8 00 00 00 48 0f cf 48 31 f8 48 89 01 49
89 17
All code
========
0: 0f 84 46 ff ff ff je 0xffffffffffffff4c
6: 65 ff 05 a4 bd e4 47 incl %gs:0x47e4bda4(%rip) # 0x47e4bdb1
d: 48 8b 4d 00 mov 0x0(%rbp),%rcx
11: 65 48 03 0d e8 5f e3 add %gs:0x47e35fe8(%rip),%rcx # 0x47e36001
18: 47
19: 9c pushf
1a: 5e pop %rsi
1b: fa cli
1c: 45 31 d2 xor %r10d,%r10d
1f: eb 2f jmp 0x50
21: 8b 45 28 mov 0x28(%rbp),%eax
24: 48 01 d0 add %rdx,%rax
27: 48 89 c7 mov %rax,%rdi
2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
2d: 48 33 85 b8 00 00 00 xor 0xb8(%rbp),%rax
34: 48 0f cf bswap %rdi
37: 48 31 f8 xor %rdi,%rax
3a: 48 89 01 mov %rax,(%rcx)
3d: 49 89 17 mov %rdx,(%r15)

Code starting with the faulting instruction
===========================================
0: 48 8b 00 mov (%rax),%rax
3: 48 33 85 b8 00 00 00 xor 0xb8(%rbp),%rax
a: 48 0f cf bswap %rdi
d: 48 31 f8 xor %rdi,%rax
10: 48 89 01 mov %rax,(%rcx)
13: 49 89 17 mov %rdx,(%r15)
[24249.112951] RSP: 0018:ffff9fc303973d20 EFLAGS: 00010086
[24249.118275] RAX: b0746d4e6bee35e2 RBX: 0000000000000001 RCX: ffff8d5a2fa31da0
[24249.125501] RDX: b0746d4e6bee3572 RSI: 0000000000000286 RDI: b0746d4e6bee35e2
[24249.132730] RBP: ffff8d56c016d500 R08: 0000000000000400 R09: ffff8d56ede0e67a
[24249.139958] R10: 0000000000000001 R11: ffff8d56c59d88c0 R12: 0000000000000010
[24249.147187] R13: 0000000000000820 R14: ffff8d5a2fa2a810 R15: ffff8d5a2fa2a818
[24249.154415] FS: 0000000000000000(0000) GS:ffff8d5a2fa00000(0000)
knlGS:0000000000000000
[24249.162620] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24249.168471] CR2: 00007f0f3f7f8760 CR3: 0000000102466000 CR4: 00000000003526f0
[24249.175717] Call Trace:
[24249.178268] <TASK>
[24249.180476] ? die_addr (arch/x86/kernel/dumpstack.c:421
arch/x86/kernel/dumpstack.c:460)
[24249.183907] ? exc_general_protection (arch/x86/kernel/traps.c:783
arch/x86/kernel/traps.c:728)
[24249.188726] ? asm_exc_general_protection
(./arch/x86/include/asm/idtentry.h:564)
[24249.193720] ? kmem_cache_alloc_bulk (mm/slub.c:377 mm/slub.c:388
mm/slub.c:395 mm/slub.c:3963 mm/slub.c:4026)
[24249.198361] ? netif_receive_skb_list_internal (net/core/dev.c:5729)
[24249.203960] napi_skb_cache_get (net/core/skbuff.c:338)
[24249.208078] __napi_build_skb (net/core/skbuff.c:517)
[24249.211934] napi_build_skb (net/core/skbuff.c:541)
[24249.215616] ixgbe_poll
(drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:2165
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:2361
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3178)
[24249.219305] __napi_poll (net/core/dev.c:6498)
[24249.222905] napi_threaded_poll (./include/linux/netpoll.h:89
net/core/dev.c:6640)
[24249.227197] ? __napi_poll (net/core/dev.c:6625)
[24249.231050] kthread (kernel/kthread.c:379)
[24249.234300] ? kthread_complete_and_exit (kernel/kthread.c:332)
[24249.239207] ret_from_fork (arch/x86/entry/entry_64.S:314)
[24249.242892] </TASK>
[24249.245185] Modules linked in: chaoskey
[24249.249133] ---[ end trace 0000000000000000 ]---
[24249.270157] pstore: backend (erst) writing error (-28)
[24249.275408] RIP: 0010:kmem_cache_alloc_bulk (mm/slub.c:377
mm/slub.c:388 mm/slub.c:395 mm/slub.c:3963 mm/slub.c:4026)
[24249.280660] Code: 0f 84 46 ff ff ff 65 ff 05 a4 bd e4 47 48 8b 4d
00 65 48 03 0d e8 5f e3 47 9c 5e fa 45 31 d2 eb 2f 8b 45 28 48 01 d0
48 89 c7 <48> 8b 00 48 33 85 b8 00 00 00 48 0f cf 48 31 f8 48 89 01 49
89 17
All code
========
0: 0f 84 46 ff ff ff je 0xffffffffffffff4c
6: 65 ff 05 a4 bd e4 47 incl %gs:0x47e4bda4(%rip) # 0x47e4bdb1
d: 48 8b 4d 00 mov 0x0(%rbp),%rcx
11: 65 48 03 0d e8 5f e3 add %gs:0x47e35fe8(%rip),%rcx # 0x47e36001
18: 47
19: 9c pushf
1a: 5e pop %rsi
1b: fa cli
1c: 45 31 d2 xor %r10d,%r10d
1f: eb 2f jmp 0x50
21: 8b 45 28 mov 0x28(%rbp),%eax
24: 48 01 d0 add %rdx,%rax
27: 48 89 c7 mov %rax,%rdi
2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
2d: 48 33 85 b8 00 00 00 xor 0xb8(%rbp),%rax
34: 48 0f cf bswap %rdi
37: 48 31 f8 xor %rdi,%rax
3a: 48 89 01 mov %rax,(%rcx)
3d: 49 89 17 mov %rdx,(%r15)

Code starting with the faulting instruction
===========================================
0: 48 8b 00 mov (%rax),%rax
3: 48 33 85 b8 00 00 00 xor 0xb8(%rbp),%rax
a: 48 0f cf bswap %rdi
d: 48 31 f8 xor %rdi,%rax
10: 48 89 01 mov %rax,(%rcx)
13: 49 89 17 mov %rdx,(%r15)
[24249.299578] RSP: 0018:ffff9fc303973d20 EFLAGS: 00010086
[24249.304917] RAX: b0746d4e6bee35e2 RBX: 0000000000000001 RCX: ffff8d5a2fa31da0
[24249.312161] RDX: b0746d4e6bee3572 RSI: 0000000000000286 RDI: b0746d4e6bee35e2
[24249.319407] RBP: ffff8d56c016d500 R08: 0000000000000400 R09: ffff8d56ede0e67a
[24249.326651] R10: 0000000000000001 R11: ffff8d56c59d88c0 R12: 0000000000000010
[24249.333896] R13: 0000000000000820 R14: ffff8d5a2fa2a810 R15: ffff8d5a2fa2a818
[24249.341141] FS: 0000000000000000(0000) GS:ffff8d5a2fa00000(0000)
knlGS:0000000000000000
[24249.349356] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24249.355206] CR2: 00007f0f3f7f8760 CR3: 0000000102466000 CR4: 00000000003526f0
[24249.362452] Kernel panic - not syncing: Fatal exception in interrupt
[24249.566854] Kernel Offset: 0x36e00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[24249.594124] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt ]---

It's also odd that i get a OOM - it only seems to happen when i enable
rx-gro-list - it's also odd because this machine always has ~8GB of
memory available

On Tue, Jun 27, 2023 at 2:31 PM Ian Kumlien <ian.kumlien@xxxxxxxxx> wrote:
>
> On Tue, Jun 27, 2023 at 11:19 AM Paolo Abeni <pabeni@xxxxxxxxxx> wrote:
> >
> > On Mon, 2023-06-26 at 20:59 +0200, Ian Kumlien wrote:
> > > On Mon, Jun 26, 2023 at 8:20 PM Ian Kumlien <ian.kumlien@xxxxxxxxx> wrote:
> > > >
> > > > Nevermind, I think I found it, I will loop this thing until I have a
> > > > proper trace....
> > >
> > > Still some question marks, but much better
> >
> > Thanks!
> > >
> > > cat bug.txt | ./scripts/decode_stacktrace.sh vmlinux
> > > [ 62.624003] BUG: kernel NULL pointer dereference, address: 00000000000000c0
> > > [ 62.631083] #PF: supervisor read access in kernel mode
> > > [ 62.636312] #PF: error_code(0x0000) - not-present page
> > > [ 62.641541] PGD 0 P4D 0
> > > [ 62.644174] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > [ 62.648629] CPU: 1 PID: 913 Comm: napi/eno2-79 Not tainted 6.4.0 #364
> > > [ 62.655162] Hardware name: Supermicro Super Server/A2SDi-12C-HLN4F,
> > > BIOS 1.7a 10/13/2022
> > > [ 62.663344] RIP: 0010:__udp_gso_segment
> > > (./include/linux/skbuff.h:2858 ./include/linux/udp.h:23
> > > net/ipv4/udp_offload.c:228 net/ipv4/udp_offload.c:261
> > > net/ipv4/udp_offload.c:277)
> >
> > So it's faulting here:
> >
> > static struct sk_buff *__udpv4_gso_segment_list_csum(struct sk_buff *segs)
> > {
> > struct sk_buff *seg;
> > struct udphdr *uh, *uh2;
> > struct iphdr *iph, *iph2;
> >
> > seg = segs;
> > uh = udp_hdr(seg);
> > iph = ip_hdr(seg);
> >
> > if ((udp_hdr(seg)->dest == udp_hdr(seg->next)->dest) &&
> > // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > The GSO segment has been assembled by skb_gro_receive_list()
> > I guess seg->next is NULL, which is somewhat unexpected as
> > napi_gro_complete() clears the gso_size when sending up the stack a
> > single frame.
> >
> > On the flip side, AFAICS, nothing prevents the stack from changing the
> > aggregated packet layout (e.g. pulling data and/or linearizing the
> > skb).
> >
> > In any case this looks more related to rx-gro-list then rx-udp-gro-
> > forwarding. I understand you have both feature enabled in your env?
> >
> > Side questions: do you have any non trivial nf/br filter rule?
> >
> > The following could possibly validate the above and avoid the issue,
> > but it's a bit papering over it. Could you please try it in your env?
>
> Will do as soon as i get home =)
>
> > Thanks!
> >
> > Paolo
> > ---
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 6c5915efbc17..75531686bfdf 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -4319,6 +4319,9 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
> >
> > skb->prev = tail;
> >
> > + if (WARN_ON_ONCE(!skb->next))
> > + goto err_linearize;
> > +
> > if (skb_needs_linearize(skb, features) &&
> > __skb_linearize(skb))
> > goto err_linearize;
> >