Re: Kernel panic in netif_rx_internal after v6 pings between netns

From: Matthieu Baerts
Date: Tue Jan 16 2024 - 17:23:42 EST


Hi Eric,

Thank you for your quick reply!

16 Jan 2024 20:17:40 Eric Dumazet <edumazet@xxxxxxxxxx>:
> On Tue, Jan 16, 2024 at 7:36 PM Matthieu Baerts <matttbe@xxxxxxxxxx> wrote:
>> Our MPTCP CIs recently hit some kernel panics when validating the -net
>> tree + 2 pending MPTCP patches. This is on top of e327b2372bc0 ("net:
>> ravb: Fix dma_addr_t truncation in error case").
>>
>> It looks like these panics are not related to MPTCP. That's why I'm
>> sharing that here:
>
> Indeed, this seems an x86 issue to me (jump labels ?)

Thank you, good point!

(I don't know why I always think there is no x86 issue :) )

> are all stack
> traces pointing to the same issue ?

I think so.

We had twice the same stack trace, and another one, sadly not
decoded. But both when doing the same thing (ping6):


# INFO: validating network environment with pings
[ 2211.138427] int3: 0000 [#1] PREEMPT SMP NOPTI
[ 2211.138427] CPU: 0 PID: 21830 Comm: ping Tainted: G                 N 6.7.0-gc6465fa4649b #1
[ 2211.138427] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 2211.138427] RIP: 0010:__netif_receive_skb_core.constprop.0+0x39/0x10b0
[ 2211.138427] Code: 54 55 53 48 83 ec 78 48 8b 2f 48 89 7c 24 10 48 89 54 24 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 70 31 c0 48 89 6c 24 30 e9 <13> 08 00 00 0f 1f 44 00 00 48 8b 85 c8 00 00 00 48 2b 85 c0 00 00
[ 2211.138427] RSP: 0018:ffffb09700003e00 EFLAGS: 00000246
[ 2211.138427] RAX: 0000000000000000 RBX: ffff9eec3dc2ef10 RCX: ffff9eebc6205700
[ 2211.138427] RDX: ffffb09700003eb8 RSI: 0000000000000000 RDI: ffffb09700003eb0
[ 2211.138427] RBP: ffff9eebc6205700 R08: 0000000000000000 R09: 0000000000000048
[ 2211.138427] R10: 00000000000002ff R11: 020000ff01000000 R12: ffff9eebc82b5000
[ 2211.138427] R13: ffff9eec3dc2ee10 R14: 0000000000000000 R15: 0000000000000002
[ 2211.138427] FS:  00007fa1f295b1c0(0000) GS:ffff9eec3dc00000(0000) knlGS:0000000000000000
[ 2211.138427] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2211.138427] CR2: 00005595dc9df240 CR3: 0000000004758000 CR4: 00000000000006f0
[ 2211.138427] Call Trace:
[ 2211.138427]  <IRQ>
[ 2211.138427]  ? die+0x37/0x90
[ 2211.138427]  ? exc_int3+0x10b/0x110
[ 2211.138427]  ? asm_exc_int3+0x39/0x40
[ 2211.138427]  ? __netif_receive_skb_core.constprop.0+0x39/0x10b0
[ 2211.138427]  ? __netif_receive_skb_core.constprop.0+0x39/0x10b0
[ 2211.138427]  ? ip6_finish_output2+0x209/0x670
[ 2211.138427]  ? ip6_output+0x12d/0x150
[ 2211.138427]  ? unix_stream_read_generic+0x7c4/0xb70
[ 2211.138427]  ? ip6_mtu+0x46/0x50
[ 2211.138427]  __netif_receive_skb_one_core+0x3d/0x80
[ 2211.138427]  process_backlog+0x9d/0x140
[ 2211.138427]  __napi_poll+0x26/0x1b0
[ 2211.138427]  net_rx_action+0x28f/0x300
[ 2211.138427]  __do_softirq+0xc0/0x28b
[ 2211.138427]  do_softirq+0x43/0x60
[ 2211.138427]  </IRQ>
[ 2211.138427]  <TASK>
[ 2211.138427]  __local_bh_enable_ip+0x5c/0x70
[ 2211.138427]  __dev_queue_xmit+0x28e/0xd70
[ 2211.138427]  ip6_finish_output2+0x2d8/0x670
[ 2211.138427]  ? ip6_output+0x12d/0x150
[ 2211.138427]  ? ip6_mtu+0x46/0x50
[ 2211.138427]  ip6_send_skb+0x22/0x70
[ 2211.138427]  rawv6_sendmsg+0xda5/0x10c0
[ 2211.138427]  ? netfs_clear_subrequests+0x63/0x80
[ 2211.138427]  ? netfs_alloc_request+0xec/0x130
[ 2211.138427]  ? folio_add_file_rmap_ptes+0x88/0xb0
[ 2211.138427]  ? set_pte_range+0xe8/0x310
[ 2211.138427]  ? next_uptodate_folio+0x85/0x260
[ 2211.138427]  ? __sock_sendmsg+0x38/0x70
[ 2211.138427]  __sock_sendmsg+0x38/0x70
[ 2211.138427]  ? move_addr_to_kernel.part.0+0x1b/0x60
[ 2211.138427]  __sys_sendto+0xfc/0x160
[ 2211.138427]  ? ktime_get_real_ts64+0x4d/0xf0
[ 2211.138427]  __x64_sys_sendto+0x24/0x30
[ 2211.138427]  do_syscall_64+0xad/0x1a0
[ 2211.138427]  entry_SYSCALL_64_after_hwframe+0x63/0x6b
[ 2211.138427] RIP: 0033:0x7fa1f2c2da0a
[ 2211.138427] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 2211.138427] RSP: 002b:00007fff0d984668 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 2211.138427] RAX: ffffffffffffffda RBX: 00007fff0d985da0 RCX: 00007fa1f2c2da0a
[ 2211.138427] RDX: 0000000000000040 RSI: 00005595dcf1d300 RDI: 0000000000000003
[ 2211.138427] RBP: 00005595dcf1d300 R08: 00007fff0d987fb4 R09: 000000000000001c
[ 2211.138427] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff0d985930
[ 2211.138427] R13: 0000000000000040 R14: 00005595dcf1f4f4 R15: 00007fff0d985da0
[ 2211.138427]  </TASK>
[ 2211.138427] Modules linked in: tcp_diag act_csum act_pedit cls_fw sch_ingress xt_mark xt_statistic xt_length xt_bpf ipt_REJECT nft_tproxy nf_tproxy_ipv6 nf_tproxy_ipv4 nft_socket nf_socket_ipv4 nf_socket_ipv6 nf_tables sch_netem mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit
[ 2211.138427] ---[ end trace 0000000000000000 ]---
[ 2211.138427] RIP: 0010:__netif_receive_skb_core.constprop.0+0x39/0x10b0
[ 2211.138427] Code: 54 55 53 48 83 ec 78 48 8b 2f 48 89 7c 24 10 48 89 54 24 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 70 31 c0 48 89 6c 24 30 e9 <13> 08 00 00 0f 1f 44 00 00 48 8b 85 c8 00 00 00 48 2b 85 c0 00 00
[ 2211.138427] RSP: 0018:ffffb09700003e00 EFLAGS: 00000246
[ 2211.138427] RAX: 0000000000000000 RBX: ffff9eec3dc2ef10 RCX: ffff9eebc6205700
[ 2211.138427] RDX: ffffb09700003eb8 RSI: 0000000000000000 RDI: ffffb09700003eb0
[ 2211.138427] RBP: ffff9eebc6205700 R08: 0000000000000000 R09: 0000000000000048
[ 2211.138427] R10: 00000000000002ff R11: 020000ff01000000 R12: ffff9eebc82b5000
[ 2211.138427] R13: ffff9eec3dc2ee10 R14: 0000000000000000 R15: 0000000000000002
[ 2211.138427] FS:  00007fa1f295b1c0(0000) GS:ffff9eec3dc00000(0000) knlGS:0000000000000000
[ 2211.138427] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2211.138427] CR2: 00005595dc9df240 CR3: 0000000004758000 CR4: 00000000000006f0
[ 2211.138427] Kernel panic - not syncing: Fatal exception in interrupt
[ 2211.138427] Kernel Offset: 0x1c400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)


> Let's cc lkml just in case this rings a bell

Thank you! Hopefully there are still people reading lkml :)

Cheers,
Matt
--
Sponsored by the NGI0 Core fund.