Re: [Intel-wired-lan] bug with rx-udp-gro-forwarding offloading?

From: Ian Kumlien
Date: Wed Jun 28 2023 - 07:47:51 EST


On Wed, Jun 28, 2023 at 11:06 AM Paolo Abeni <pabeni@xxxxxxxxxx> wrote:
>
> Hello,
>
> On Wed, 2023-06-28 at 09:37 +0200, Ian Kumlien wrote:
> > Been running all night but eventually it crashed again...
> >
> > [21753.055795] Out of memory: Killed process 970 (qemu-system-x86)
> > total-vm:4709488kB, anon-rss:2172652kB, file-rss:4608kB,
> > shmem-rss:0kB, UID:77 pgtables:4800kB oom_score_adj:0
> > [24249.061154] general protection fault, probably for non-canonical
> > address 0xb0746d4e6bee35e2: 0000 [#1] PREEMPT SMP NOPTI
> > [24249.072138] CPU: 0 PID: 893 Comm: napi/eno1-68 Tainted: G W
> > 6.4.0-dirty #366
> > [24249.080670] Hardware name: Supermicro Super Server/A2SDi-12C-HLN4F,
> > BIOS 1.7a 10/13/2022
> > [24249.088852] RIP: 0010:kmem_cache_alloc_bulk (mm/slub.c:377
> > mm/slub.c:388 mm/slub.c:395 mm/slub.c:3963 mm/slub.c:4026)
> > [24249.094086] Code: 0f 84 46 ff ff ff 65 ff 05 a4 bd e4 47 48 8b 4d
> > 00 65 48 03 0d e8 5f e3 47 9c 5e fa 45 31 d2 eb 2f 8b 45 28 48 01 d0
> > 48 89 c7 <48> 8b 00 48 33 85 b8 00 00 00 48 0f cf 48 31 f8 48 89 01 49
> > 89 17
> > All code
> > ========
> > 0: 0f 84 46 ff ff ff je 0xffffffffffffff4c
> > 6: 65 ff 05 a4 bd e4 47 incl %gs:0x47e4bda4(%rip) # 0x47e4bdb1
> > d: 48 8b 4d 00 mov 0x0(%rbp),%rcx
> > 11: 65 48 03 0d e8 5f e3 add %gs:0x47e35fe8(%rip),%rcx # 0x47e36001
> > 18: 47
> > 19: 9c pushf
> > 1a: 5e pop %rsi
> > 1b: fa cli
> > 1c: 45 31 d2 xor %r10d,%r10d
> > 1f: eb 2f jmp 0x50
> > 21: 8b 45 28 mov 0x28(%rbp),%eax
> > 24: 48 01 d0 add %rdx,%rax
> > 27: 48 89 c7 mov %rax,%rdi
> > 2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
> > 2d: 48 33 85 b8 00 00 00 xor 0xb8(%rbp),%rax
> > 34: 48 0f cf bswap %rdi
> > 37: 48 31 f8 xor %rdi,%rax
> > 3a: 48 89 01 mov %rax,(%rcx)
> > 3d: 49 89 17 mov %rdx,(%r15)
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 48 8b 00 mov (%rax),%rax
> > 3: 48 33 85 b8 00 00 00 xor 0xb8(%rbp),%rax
> > a: 48 0f cf bswap %rdi
> > d: 48 31 f8 xor %rdi,%rax
> > 10: 48 89 01 mov %rax,(%rcx)
> > 13: 49 89 17 mov %rdx,(%r15)
> > [24249.112951] RSP: 0018:ffff9fc303973d20 EFLAGS: 00010086
> > [24249.118275] RAX: b0746d4e6bee35e2 RBX: 0000000000000001 RCX: ffff8d5a2fa31da0
> > [24249.125501] RDX: b0746d4e6bee3572 RSI: 0000000000000286 RDI: b0746d4e6bee35e2
> > [24249.132730] RBP: ffff8d56c016d500 R08: 0000000000000400 R09: ffff8d56ede0e67a
> > [24249.139958] R10: 0000000000000001 R11: ffff8d56c59d88c0 R12: 0000000000000010
> > [24249.147187] R13: 0000000000000820 R14: ffff8d5a2fa2a810 R15: ffff8d5a2fa2a818
> > [24249.154415] FS: 0000000000000000(0000) GS:ffff8d5a2fa00000(0000)
> > knlGS:0000000000000000
> > [24249.162620] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [24249.168471] CR2: 00007f0f3f7f8760 CR3: 0000000102466000 CR4: 00000000003526f0
> > [24249.175717] Call Trace:
> > [24249.178268] <TASK>
> > [24249.180476] ? die_addr (arch/x86/kernel/dumpstack.c:421
> > arch/x86/kernel/dumpstack.c:460)
> > [24249.183907] ? exc_general_protection (arch/x86/kernel/traps.c:783
> > arch/x86/kernel/traps.c:728)
> > [24249.188726] ? asm_exc_general_protection
> > (./arch/x86/include/asm/idtentry.h:564)
> > [24249.193720] ? kmem_cache_alloc_bulk (mm/slub.c:377 mm/slub.c:388
> > mm/slub.c:395 mm/slub.c:3963 mm/slub.c:4026)
> > [24249.198361] ? netif_receive_skb_list_internal (net/core/dev.c:5729)
> > [24249.203960] napi_skb_cache_get (net/core/skbuff.c:338)
> > [24249.208078] __napi_build_skb (net/core/skbuff.c:517)
> > [24249.211934] napi_build_skb (net/core/skbuff.c:541)
> > [24249.215616] ixgbe_poll
> > (drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:2165
> > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:2361
> > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3178)
> > [24249.219305] __napi_poll (net/core/dev.c:6498)
> > [24249.222905] napi_threaded_poll (./include/linux/netpoll.h:89
> > net/core/dev.c:6640)
> > [24249.227197] ? __napi_poll (net/core/dev.c:6625)
> > [24249.231050] kthread (kernel/kthread.c:379)
> > [24249.234300] ? kthread_complete_and_exit (kernel/kthread.c:332)
> > [24249.239207] ret_from_fork (arch/x86/entry/entry_64.S:314)
> > [24249.242892] </TASK>
> > [24249.245185] Modules linked in: chaoskey
> > [24249.249133] ---[ end trace 0000000000000000 ]---
> > [24249.270157] pstore: backend (erst) writing error (-28)
> > [24249.275408] RIP: 0010:kmem_cache_alloc_bulk (mm/slub.c:377
> > mm/slub.c:388 mm/slub.c:395 mm/slub.c:3963 mm/slub.c:4026)
> > [24249.280660] Code: 0f 84 46 ff ff ff 65 ff 05 a4 bd e4 47 48 8b 4d
> > 00 65 48 03 0d e8 5f e3 47 9c 5e fa 45 31 d2 eb 2f 8b 45 28 48 01 d0
> > 48 89 c7 <48> 8b 00 48 33 85 b8 00 00 00 48 0f cf 48 31 f8 48 89 01 49
> > 89 17
> > All code
> > ========
> > 0: 0f 84 46 ff ff ff je 0xffffffffffffff4c
> > 6: 65 ff 05 a4 bd e4 47 incl %gs:0x47e4bda4(%rip) # 0x47e4bdb1
> > d: 48 8b 4d 00 mov 0x0(%rbp),%rcx
> > 11: 65 48 03 0d e8 5f e3 add %gs:0x47e35fe8(%rip),%rcx # 0x47e36001
> > 18: 47
> > 19: 9c pushf
> > 1a: 5e pop %rsi
> > 1b: fa cli
> > 1c: 45 31 d2 xor %r10d,%r10d
> > 1f: eb 2f jmp 0x50
> > 21: 8b 45 28 mov 0x28(%rbp),%eax
> > 24: 48 01 d0 add %rdx,%rax
> > 27: 48 89 c7 mov %rax,%rdi
> > 2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
> > 2d: 48 33 85 b8 00 00 00 xor 0xb8(%rbp),%rax
> > 34: 48 0f cf bswap %rdi
> > 37: 48 31 f8 xor %rdi,%rax
> > 3a: 48 89 01 mov %rax,(%rcx)
> > 3d: 49 89 17 mov %rdx,(%r15)
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 48 8b 00 mov (%rax),%rax
> > 3: 48 33 85 b8 00 00 00 xor 0xb8(%rbp),%rax
> > a: 48 0f cf bswap %rdi
> > d: 48 31 f8 xor %rdi,%rax
> > 10: 48 89 01 mov %rax,(%rcx)
> > 13: 49 89 17 mov %rdx,(%r15)
> > [24249.299578] RSP: 0018:ffff9fc303973d20 EFLAGS: 00010086
> > [24249.304917] RAX: b0746d4e6bee35e2 RBX: 0000000000000001 RCX: ffff8d5a2fa31da0
> > [24249.312161] RDX: b0746d4e6bee3572 RSI: 0000000000000286 RDI: b0746d4e6bee35e2
> > [24249.319407] RBP: ffff8d56c016d500 R08: 0000000000000400 R09: ffff8d56ede0e67a
> > [24249.326651] R10: 0000000000000001 R11: ffff8d56c59d88c0 R12: 0000000000000010
> > [24249.333896] R13: 0000000000000820 R14: ffff8d5a2fa2a810 R15: ffff8d5a2fa2a818
> > [24249.341141] FS: 0000000000000000(0000) GS:ffff8d5a2fa00000(0000)
> > knlGS:0000000000000000
> > [24249.349356] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [24249.355206] CR2: 00007f0f3f7f8760 CR3: 0000000102466000 CR4: 00000000003526f0
> > [24249.362452] Kernel panic - not syncing: Fatal exception in interrupt
> > [24249.566854] Kernel Offset: 0x36e00000 from 0xffffffff81000000
> > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [24249.594124] ---[ end Kernel panic - not syncing: Fatal exception in
> > interrupt ]---
> >
> > It's also odd that i get a OOM - it only seems to happen when i enable
> > rx-gro-list
>
> Unfortunately, not the result I was looking for. That leads to more
> questions then answer, I'm sorry.

I understand you...

> How long did the host keep going with rx-gro-list enabled?

Well, hours...

reboot system boot 6.4.0-dirty Wed Jun 28 04:20 - 13:39 (09:19)
reboot system boot 6.4.0-dirty Tue Jun 27 21:31 - 13:39 (16:08)

So, lets imagine a few seconds to login and enable everything

> Did you observe the WARN_ON() introduced by the tentative fix?

I could only see the console, so saw nothing...

> > - it's also odd because this machine always has ~8GB of
> > memory available
>
> It looks like there is a memory leak somewhere, and I don't think the
> tentative fixup introduced such issue.

I agree, it was there before...

> It looks like the above splat is due to a slab corruption, which in
> turn could be unrelated from the mentioned leak, but it could/should
> be related to rx-gro-list.

Agreed =)

> Could you please run the test with both kmemleak and kasan enabled?

Machine-slowdown-enabled^tm

> Additionally could you please disclose if you have non trivial
> netfilter and/or bridge filter and/or tc rules possibly modifying the
> incoming/egress packets?

I only have basic reject accept rules, some snat/dnat pairs, but i
don't see it ending up in "non trivial" ;)

> If kasan is not an option, could you please apply the debug the patch
> below? (on top of the previous one)

I actually did both, if it's unrelated we should know as well..

I hope i have something for you before tomorrow, else there will be a
bit of a break until next week

> Thanks!
>
> Paolo
> ---
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 6c5915efbc17..94adca27b205 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4295,6 +4295,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
> delta_len += nskb->len;
>
> skb_push(nskb, -skb_network_offset(nskb) + offset);
> + if (WARN_ON_ONCE(nskb->data - skb->head > skb->tail))
> + goto err_linearize;
>
> skb_release_head_state(nskb);
> len_diff = skb_network_header_len(nskb) - skb_network_header_len(skb);
> @@ -4302,6 +4304,11 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
>
> skb_headers_offset_update(nskb, skb_headroom(nskb) - skb_headroom(skb));
> nskb->transport_header += len_diff;
> + if (WARN_ON_ONCE(tnl_hlen > skb_headroom(nskb)))
> + goto err_linearize;
> + if (WARN_ON_ONCE(skb_headroom(nskb) + offset > nskb->tail))
> + goto err_linearize;
> +
> skb_copy_from_linear_data_offset(skb, -tnl_hlen,
> nskb->data - tnl_hlen,
> offset + tnl_hlen);
>
>