Re: [Intel-wired-lan] bug with rx-udp-gro-forwarding offloading?

From: Ian Kumlien
Date: Mon Jun 26 2023 - 14:02:12 EST


On Mon, Jun 26, 2023 at 7:56 PM Paolo Abeni <pabeni@xxxxxxxxxx> wrote:
>
> On Mon, 2023-06-26 at 19:30 +0200, Ian Kumlien wrote:
> > There, that didn't take long, even with wireguard disabled
> >
> > [14079.678380] BUG: kernel NULL pointer dereference, address: 00000000000000c0
> > [14079.685456] #PF: supervisor read access in kernel mode
> > [14079.690686] #PF: error_code(0x0000) - not-present page
> > [14079.695915] PGD 0 P4D 0
> > [14079.698540] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [14079.702996] CPU: 11 PID: 891 Comm: napi/eno2-80 Not tainted 6.4.0 #360
> > [14079.709614] Hardware name: Supermicro Super Server/A2SDi-12C-HLN4F,
> > BIOS 1.7a 10/13/2022
> > [14079.717796] RIP: 0010:__udp_gso_segment+0x346/0x4f0
> > [14079.722778] Code: c3 08 66 89 5c 02 04 45 84 e4 0f 85 27 fd ff ff
> > 49 8b 1e 49 8b ae c0 00 00 00 41 0f b7 86 b4 00 00 00 45 0f b7 a6 b2
> > 00 00 00 <48> 8b b3 c0 00 00 00 0f b7 8b b2 00 00 00 49 01 ec 48 01 c5
> > 48 8d
> > [14079.741645] RSP: 0018:ffffa83643a4f818 EFLAGS: 00010246
> > [14079.746966] RAX: 00000000000000ce RBX: 0000000000000000 RCX: 0000000000000000
> > [14079.754195] RDX: ffffa2ad1403b000 RSI: 0000000000000028 RDI: ffffa2afc9d302d4
> > [14079.761422] RBP: ffffa2ad1403b000 R08: 0000000000000022 R09: 00002000001558c9
> > [14079.768650] R10: 0000000000000000 R11: ffffa2b02fcea888 R12: 00000000000000e2
> > [14079.775879] R13: ffffa2afc9d30200 R14: ffffa2afc9d30200 R15: 00002000001558c9
> > [14079.783106] FS: 0000000000000000(0000) GS:ffffa2b02fcc0000(0000)
> > knlGS:0000000000000000
> > [14079.791305] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [14079.797162] CR2: 00000000000000c0 CR3: 0000000151ff4000 CR4: 00000000003526e0
> > [14079.804408] Call Trace:
> > [14079.806961] <TASK>
> > [14079.809170] ? __die+0x1a/0x60
> > [14079.812340] ? page_fault_oops+0x158/0x440
> > [14079.816551] ? ip6_route_output_flags+0xe3/0x160
> > [14079.821284] ? exc_page_fault+0x3f4/0x820
> > [14079.825408] ? update_load_avg+0x77/0x710
> > [14079.829534] ? asm_exc_page_fault+0x22/0x30
> > [14079.833836] ? __udp_gso_segment+0x346/0x4f0
> > [14079.838218] ? __udp_gso_segment+0x2fa/0x4f0
> > [14079.842600] ? _raw_spin_unlock_irqrestore+0x16/0x30
> > [14079.847679] ? try_to_wake_up+0x8e/0x5a0
> > [14079.851713] inet_gso_segment+0x150/0x3c0
> > [14079.855827] ? vhost_poll_wakeup+0x31/0x40
> > [14079.860032] skb_mac_gso_segment+0x9b/0x110
> > [14079.864331] __skb_gso_segment+0xae/0x160
> > [14079.868455] ? netif_skb_features+0x144/0x290
> > [14079.872928] validate_xmit_skb+0x167/0x370
> > [14079.877139] validate_xmit_skb_list+0x43/0x70
> > [14079.881612] sch_direct_xmit+0x267/0x380
> > [14079.885641] __qdisc_run+0x140/0x590
> > [14079.889324] __dev_queue_xmit+0x44d/0xba0
> > [14079.893450] ? nf_hook_slow+0x3c/0xb0
> > [14079.897229] br_dev_queue_push_xmit+0xb2/0x1c0
> > [14079.901788] maybe_deliver+0xa9/0x100
> > [14079.905564] br_flood+0x8a/0x180
> > [14079.908903] br_handle_frame_finish+0x31f/0x5b0
> > [14079.913547] br_handle_frame+0x28f/0x3a0
> > [14079.917585] ? ipv6_find_hdr+0x1f0/0x3e0
> > [14079.921622] ? br_handle_local_finish+0x20/0x20
> > [14079.926267] __netif_receive_skb_core.constprop.0+0x4c5/0xc90
> > [14079.932125] ? br_handle_frame_finish+0x5b0/0x5b0
> > [14079.936946] ? ___slab_alloc+0x4bf/0xaf0
> > [14079.940986] __netif_receive_skb_list_core+0x107/0x250
> > [14079.946240] netif_receive_skb_list_internal+0x194/0x2b0
> > [14079.951660] ? napi_gro_flush+0x97/0xf0
> > [14079.955604] napi_complete_done+0x69/0x180
> > [14079.959808] ixgbe_poll+0xe10/0x12e0
> > [14079.963506] __napi_poll+0x26/0x1b0
> > [14079.967106] napi_threaded_poll+0x232/0x250
> > [14079.971405] ? __napi_poll+0x1b0/0x1b0
> > [14079.975260] kthread+0xee/0x120
> > [14079.978510] ? kthread_complete_and_exit+0x20/0x20
> > [14079.983415] ret_from_fork+0x22/0x30
> > [14079.987102] </TASK>
> > [14079.989395] Modules linked in: chaoskey
> > [14079.993347] CR2: 00000000000000c0
> > [14079.996773] ---[ end trace 0000000000000000 ]---
> > [14080.018013] pstore: backend (erst) writing error (-28)
> > [14080.023274] RIP: 0010:__udp_gso_segment+0x346/0x4f0
> > [14080.028264] Code: c3 08 66 89 5c 02 04 45 84 e4 0f 85 27 fd ff ff
> > 49 8b 1e 49 8b ae c0 00 00 00 41 0f b7 86 b4 00 00 00 45 0f b7 a6 b2
> > 00 00 00 <48> 8b b3 c0 00 00 00 0f b7 8b b2 00 00 00 49 01 ec 48 01 c5
> > 48 8d
> > [14080.047181] RSP: 0018:ffffa83643a4f818 EFLAGS: 00010246
> > [14080.052522] RAX: 00000000000000ce RBX: 0000000000000000 RCX: 0000000000000000
> > [14080.059765] RDX: ffffa2ad1403b000 RSI: 0000000000000028 RDI: ffffa2afc9d302d4
> > [14080.067012] RBP: ffffa2ad1403b000 R08: 0000000000000022 R09: 00002000001558c9
> > [14080.074257] R10: 0000000000000000 R11: ffffa2b02fcea888 R12: 00000000000000e2
> > [14080.081502] R13: ffffa2afc9d30200 R14: ffffa2afc9d30200 R15: 00002000001558c9
> > [14080.088746] FS: 0000000000000000(0000) GS:ffffa2b02fcc0000(0000)
> > knlGS:0000000000000000
> > [14080.096964] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [14080.102823] CR2: 00000000000000c0 CR3: 0000000151ff4000 CR4: 00000000003526e0
> > [14080.110067] Kernel panic - not syncing: Fatal exception in interrupt
> > [14080.325501] Kernel Offset: 0x12600000 from 0xffffffff81000000
> > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [14080.353129] ---[ end Kernel panic - not syncing: Fatal exception in
> > interrupt ]---
>
> Could you please provide a decoded stack trace?
>
> # in your git tree:
> cat <stacktrace file > | ./scripts/decode_stacktrace.sh vmlinux

I'm afraid it doesn't yield more information, really... I can't say why

cat bug.txt | ./scripts/decode_stacktrace.sh vmlinux
[14079.678380] BUG: kernel NULL pointer dereference, address: 00000000000000c0
[14079.685456] #PF: supervisor read access in kernel mode
[14079.690686] #PF: error_code(0x0000) - not-present page
[14079.695915] PGD 0 P4D 0
[14079.698540] Oops: 0000 [#1] PREEMPT SMP NOPTI
[14079.702996] CPU: 11 PID: 891 Comm: napi/eno2-80 Not tainted 6.4.0 #360
[14079.709614] Hardware name: Supermicro Super Server/A2SDi-12C-HLN4F,
BIOS 1.7a 10/13/2022
[14079.717796] RIP: 0010:__udp_gso_segment (??:?)
[14079.722778] Code: c3 08 66 89 5c 02 04 45 84 e4 0f 85 27 fd ff ff

Code starting with the faulting instruction
===========================================
0: c3 ret
1: 08 66 89 or %ah,-0x77(%rsi)
4: 5c pop %rsp
5: 02 04 45 84 e4 0f 85 add -0x7af01b7c(,%rax,2),%al
c: 27 (bad)
d: fd std
e: ff (bad)
f: ff .byte 0xff
49 8b 1e 49 8b ae c0 00 00 00 41 0f b7 86 b4 00 00 00 45 0f b7 a6 b2
00 00 00 <48> 8b b3 c0 00 00 00 0f b7 8b b2 00 00 00 49 01 ec 48 01 c5
48 8d
[14079.741645] RSP: 0018:ffffa83643a4f818 EFLAGS: 00010246
[14079.746966] RAX: 00000000000000ce RBX: 0000000000000000 RCX: 0000000000000000
[14079.754195] RDX: ffffa2ad1403b000 RSI: 0000000000000028 RDI: ffffa2afc9d302d4
[14079.761422] RBP: ffffa2ad1403b000 R08: 0000000000000022 R09: 00002000001558c9
[14079.768650] R10: 0000000000000000 R11: ffffa2b02fcea888 R12: 00000000000000e2
[14079.775879] R13: ffffa2afc9d30200 R14: ffffa2afc9d30200 R15: 00002000001558c9
[14079.783106] FS: 0000000000000000(0000) GS:ffffa2b02fcc0000(0000)
knlGS:0000000000000000
[14079.791305] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14079.797162] CR2: 00000000000000c0 CR3: 0000000151ff4000 CR4: 00000000003526e0
[14079.804408] Call Trace:
[14079.806961] <TASK>
[14079.809170] ? __die (??:?)
[14079.812340] ? page_fault_oops (fault.c:?)
[14079.816551] ? ip6_route_output_flags (??:?)
[14079.821284] ? exc_page_fault (??:?)
[14079.825408] ? update_load_avg (fair.c:?)
[14079.829534] ? asm_exc_page_fault (??:?)
[14079.833836] ? __udp_gso_segment (??:?)
[14079.838218] ? __udp_gso_segment (??:?)
[14079.842600] ? _raw_spin_unlock_irqrestore (??:?)
[14079.847679] ? try_to_wake_up (core.c:?)
[14079.851713] inet_gso_segment (??:?)
[14079.855827] ? vhost_poll_wakeup (vhost.c:?)
[14079.860032] skb_mac_gso_segment (??:?)
[14079.864331] __skb_gso_segment (??:?)
[14079.868455] ? netif_skb_features (??:?)
[14079.872928] validate_xmit_skb (dev.c:?)
[14079.877139] validate_xmit_skb_list (??:?)
[14079.881612] sch_direct_xmit (??:?)
[14079.885641] __qdisc_run (??:?)
[14079.889324] __dev_queue_xmit (??:?)
[14079.893450] ? nf_hook_slow (??:?)
[14079.897229] br_dev_queue_push_xmit (??:?)
[14079.901788] maybe_deliver (br_forward.c:?)
[14079.905564] br_flood (??:?)
[14079.908903] br_handle_frame_finish (??:?)
[14079.913547] br_handle_frame (br_input.c:?)
[14079.917585] ? ipv6_find_hdr (??:?)
[14079.921622] ? br_handle_local_finish (??:?)
[14079.926267] __netif_receive_skb_core.constprop.0 (dev.c:?)
[14079.932125] ? br_handle_frame_finish (br_input.c:?)
[14079.936946] ? ___slab_alloc (slub.c:?)
[14079.940986] __netif_receive_skb_list_core (dev.c:?)
[14079.946240] netif_receive_skb_list_internal (??:?)
[14079.951660] ? napi_gro_flush (??:?)
[14079.955604] napi_complete_done (??:?)
[14079.959808] ixgbe_poll (??:?)
[14079.963506] __napi_poll (dev.c:?)
[14079.967106] napi_threaded_poll (dev.c:?)
[14079.971405] ? __napi_poll (dev.c:?)
[14079.975260] kthread (kthread.c:?)
[14079.978510] ? kthread_complete_and_exit (kthread.c:?)
[14079.983415] ret_from_fork (??:?)
[14079.987102] </TASK>
[14079.989395] Modules linked in: chaoskey
[14079.993347] CR2: 00000000000000c0
[14079.996773] ---[ end trace 0000000000000000 ]---
[14080.018013] pstore: backend (erst) writing error (-28)
[14080.023274] RIP: 0010:__udp_gso_segment (??:?)
[14080.028264] Code: c3 08 66 89 5c 02 04 45 84 e4 0f 85 27 fd ff ff

Code starting with the faulting instruction
===========================================
0: c3 ret
1: 08 66 89 or %ah,-0x77(%rsi)
4: 5c pop %rsp
5: 02 04 45 84 e4 0f 85 add -0x7af01b7c(,%rax,2),%al
c: 27 (bad)
d: fd std
e: ff (bad)
f: ff .byte 0xff
49 8b 1e 49 8b ae c0 00 00 00 41 0f b7 86 b4 00 00 00 45 0f b7 a6 b2
00 00 00 <48> 8b b3 c0 00 00 00 0f b7 8b b2 00 00 00 49 01 ec 48 01 c5
48 8d
[14080.047181] RSP: 0018:ffffa83643a4f818 EFLAGS: 00010246
[14080.052522] RAX: 00000000000000ce RBX: 0000000000000000 RCX: 0000000000000000
[14080.059765] RDX: ffffa2ad1403b000 RSI: 0000000000000028 RDI: ffffa2afc9d302d4
[14080.067012] RBP: ffffa2ad1403b000 R08: 0000000000000022 R09: 00002000001558c9
[14080.074257] R10: 0000000000000000 R11: ffffa2b02fcea888 R12: 00000000000000e2
[14080.081502] R13: ffffa2afc9d30200 R14: ffffa2afc9d30200 R15: 00002000001558c9
[14080.088746] FS: 0000000000000000(0000) GS:ffffa2b02fcc0000(0000)
knlGS:0000000000000000
[14080.096964] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14080.102823] CR2: 00000000000000c0 CR3: 0000000151ff4000 CR4: 00000000003526e0
[14080.110067] Kernel panic - not syncing: Fatal exception in interrupt
[14080.325501] Kernel Offset: 0x12600000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[14080.353129] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt ]---

The binaries aren't stripped so i don't, currently, know why it's like this...

but i also get:
gdb vmlinux
GNU gdb (Gentoo 13.2 vanilla) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from vmlinux...
(No debugging symbols found in vmlinux)
Traceback (most recent call last):
File "/usr/src/linux/vmlinux-gdb.py", line 25, in <module>
import linux.constants
File "/usr/src/linux/scripts/gdb/linux/constants.py", line 10, in <module>
LX_hrtimer_resolution = gdb.parse_and_eval("hrtimer_resolution")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
gdb.error: 'hrtimer_resolution' has unknown type; cast it to its declared type
---

> Thanks!
>
> Paolo
>