Re: Kernel panic in netif_rx_internal after v6 pings between netns

From: Eric Dumazet
Date: Tue Jan 16 2024 - 14:17:51 EST


On Tue, Jan 16, 2024 at 7:36 PM Matthieu Baerts <matttbe@xxxxxxxxxx> wrote:
>
> Hello,
>
> Our MPTCP CIs recently hit some kernel panics when validating the -net
> tree + 2 pending MPTCP patches. This is on top of e327b2372bc0 ("net:
> ravb: Fix dma_addr_t truncation in error case").
>
> It looks like these panics are not related to MPTCP. That's why I'm
> sharing that here:

Indeed, this seems an x86 issue to me (jump labels ?), are all stack
traces pointing to the same issue ?

Let's cc lkml just in case this rings a bell

>
> > # INFO: validating network environment with pings
> > [ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI
> > [ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1
> > [ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> > [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
> > [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
> > All code
> > ========
> > 0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
> > 7: 00
> > 8: 0f 1f 40 00 nopl 0x0(%rax)
> > c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> > 11: 55 push %rbp
> > 12: 48 89 fd mov %rdi,%rbp
> > 15: 48 83 ec 20 sub $0x20,%rsp
> > 19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
> > 20: 00 00
> > 22: 48 89 44 24 18 mov %rax,0x18(%rsp)
> > 27: 31 c0 xor %eax,%eax
> > 29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
> > 2e: 66 90 xchg %ax,%ax
> > 30: 66 90 xchg %ax,%ax
> > 32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
> > 37: 48 89 ef mov %rbp,%rdi
> > 3a: 65 gs
> > 3b: 8b .byte 0x8b
> > 3c: 35 .byte 0x35
> > 3d: 17 (bad)
> > 3e: 9d popf
> > 3f: 11 .byte 0x11
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: c9 leave
> > 1: 00 00 add %al,(%rax)
> > 3: 00 66 90 add %ah,-0x70(%rsi)
> > 6: 66 90 xchg %ax,%ax
> > 8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
> > d: 48 89 ef mov %rbp,%rdi
> > 10: 65 gs
> > 11: 8b .byte 0x8b
> > 12: 35 .byte 0x35
> > 13: 17 (bad)
> > 14: 9d popf
> > 15: 11 .byte 0x11
> > [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
> > [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
> > [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
> > [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
> > [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
> > [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
> > [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
> > [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
> > [ 45.505547] Call Trace:
> > [ 45.505547] <IRQ>
> > [ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421)
> > [ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762)
> > [ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569)
> > [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
> > [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
> > [ 45.505547] __netif_rx (net/core/dev.c:5084)
> > [ 45.505547] veth_xmit (drivers/net/veth.c:321)
> > [ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989)
> > [ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367)
> > [ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783)
> > [ 45.505547] ? eth_header (net/ethernet/eth.c:85)
> > [ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542)
> > [ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
> > [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
> > [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
> > [ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812)
> > [ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939)
> > [ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939)
> > [ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440)
> > [ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779)
> > [ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537)
> > [ 45.505547] process_backlog (include/linux/rcupdate.h:779)
> > [ 45.505547] __napi_poll (net/core/dev.c:6576)
> > [ 45.505547] net_rx_action (net/core/dev.c:6647)
> > [ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27)
> > [ 45.505547] do_softirq (kernel/softirq.c:454)
> > [ 45.505547] </IRQ>
> > [ 45.505547] <TASK>
> > [ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381)
> > [ 45.505547] __dev_queue_xmit (net/core/dev.c:4379)
> > [ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171)
> > [ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
> > [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
> > [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
> > [ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584)
> > [ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373)
> > [ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42)
> > [ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206)
> > [ 45.505547] ? set_pte_range (mm/memory.c:4529)
> > [ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699)
> > [ 45.505547] ? __sock_sendmsg (net/socket.c:733)
> > [ 45.505547] __sock_sendmsg (net/socket.c:733)
> > [ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253)
> > [ 45.505547] __sys_sendto (net/socket.c:2191)
> > [ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566)
> > [ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27)
> > [ 45.505547] __x64_sys_sendto (net/socket.c:2203)
> > [ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52)
> > [ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
> > [ 45.505547] RIP: 0033:0x7fa1d099ca0a
> > [ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
> > All code
> > ========
> > 0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)
> > 4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
> > b: eb b8 jmp 0xffffffffffffffc5
> > d: 0f 1f 00 nopl (%rax)
> > 10: f3 0f 1e fa endbr64
> > 14: 41 89 ca mov %ecx,%r10d
> > 17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
> > 1e: 00
> > 1f: 85 c0 test %eax,%eax
> > 21: 75 15 jne 0x38
> > 23: b8 2c 00 00 00 mov $0x2c,%eax
> > 28: 0f 05 syscall
> > 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
> > 30: 77 7e ja 0xb0
> > 32: c3 ret
> > 33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> > 38: 41 54 push %r12
> > 3a: 48 83 ec 30 sub $0x30,%rsp
> > 3e: 44 rex.R
> > 3f: 89 .byte 0x89
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
> > 6: 77 7e ja 0x86
> > 8: c3 ret
> > 9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> > e: 41 54 push %r12
> > 10: 48 83 ec 30 sub $0x30,%rsp
> > 14: 44 rex.R
> > 15: 89 .byte 0x89
> > [ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> > [ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a
> > [ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003
> > [ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c
> > [ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20
> > [ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090
> > [ 45.505547] </TASK>
> > [ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit
> > [ 45.505547] ---[ end trace 0000000000000000 ]---
> > [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
> > [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
> > All code
> > ========
> > 0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
> > 7: 00
> > 8: 0f 1f 40 00 nopl 0x0(%rax)
> > c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> > 11: 55 push %rbp
> > 12: 48 89 fd mov %rdi,%rbp
> > 15: 48 83 ec 20 sub $0x20,%rsp
> > 19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
> > 20: 00 00
> > 22: 48 89 44 24 18 mov %rax,0x18(%rsp)
> > 27: 31 c0 xor %eax,%eax
> > 29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
> > 2e: 66 90 xchg %ax,%ax
> > 30: 66 90 xchg %ax,%ax
> > 32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
> > 37: 48 89 ef mov %rbp,%rdi
> > 3a: 65 gs
> > 3b: 8b .byte 0x8b
> > 3c: 35 .byte 0x35
> > 3d: 17 (bad)
> > 3e: 9d popf
> > 3f: 11 .byte 0x11
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: c9 leave
> > 1: 00 00 add %al,(%rax)
> > 3: 00 66 90 add %ah,-0x70(%rsi)
> > 6: 66 90 xchg %ax,%ax
> > 8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
> > d: 48 89 ef mov %rbp,%rdi
> > 10: 65 gs
> > 11: 8b .byte 0x8b
> > 12: 35 .byte 0x35
> > 13: 17 (bad)
> > 14: 9d popf
> > 15: 11 .byte 0x11
> > [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
> > [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
> > [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
> > [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
> > [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
> > [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
> > [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
> > [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
> > [ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt
> > [ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>
>
> When hitting the panic, the MPTCP selftest was doing some pings -- v6
> according to the call trace -- between different netns: client, server,
> 2 routers in between with some TC config. See [1] for more details. In
> other words, that's before creating MPTCP connections.
>
> These panics are not easy to reproduce. In fact, we only saw the issue 2
> (maybe 3) times, only when running on Github Actions (without KVM). I
> didn't manage to reproduce it locally.
>
> It is only recently that we have started to use Github Actions to do
> some validations, so I cannot confirm that it is a very recent issue. I
> think the CI hit the same issue a few days ago, on top of bec161add35b
> ("amt: do not use overwrapped cb area"), but there was another issue and
> the debug info have not been stored.
>
> For reference, I originally added info in a Github issue [2]. If the CI
> hits the same bug again, I will add stacktrace there. Please tell me if
> I should cc someone.
>
> If you have any idea what is causing such panic, please tell me. I can
> also add test patches in the MPTCP tree if needed.
>
>
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/tools/testing/selftests/net/mptcp/mptcp_connect.sh?id=e327b2372bc0#n171
>
> [2]
> https://github.com/multipath-tcp/mptcp_net-next/issues/471#issuecomment-1894061756
>
>
> Cheers,
> Matt
> --
> Sponsored by the NGI0 Core fund.