Re: [PATCH net] udp: fix segmentation crash for untrusted source packet

From: Lena Wang (王娜)
Date: Tue Mar 26 2024 - 09:11:01 EST


On Sat, 2024-03-16 at 09:47 -0400, Willem de Bruijn wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> Lena Wang (王娜) wrote:
> > On Wed, 2024-03-13 at 16:41 +0100, Paolo Abeni wrote:
> > >
> > > External email : Please do not click links or open attachments
> until
> > > you have verified the sender or the content.
> > > On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote:
> > > > Kernel exception is reported when making udp frag list
> > > segmentation.
> > > > Backtrace is as below:
> > > > at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/udp_offload.c:229
> > > > at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/udp_offload.c:262
> > > > features=features@entry=19, is_ipv6=false)
> > > > at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/udp_offload.c:289
> > > > features=19)
> > > > at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/udp_offload.c:399
> > > > features=19)
> > > > at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/af_inet.c:1418
> > > > skb@entry=0x0, features=19, features@entry=0)
> > > > at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/core/gso.c:53
> > > > tx_path=<optimized out>)
> > > > at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/core/gso.c:124
> > >
> > > A full backtrace would help better understanding the issue.
> >
> > Below is full backtrace:
> > [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
> > G W OE 6.6.17-android15-0-g380371ea9bf1 #1
> > [ 1100.812211][ C3] Hardware name: MT6991(ENG) (DT)
> > [ 1100.812215][ C3] Call trace:
> > [ 1100.812218][ C3] dump_backtrace+0xec/0x138
> > [ 1100.812222][ C3] show_stack+0x18/0x24
> > [ 1100.812226][ C3] dump_stack_lvl+0x50/0x6c
> > [ 1100.812232][ C3] dump_stack+0x18/0x24
> > [ 1100.812237][ C3] mrdump_common_die+0x24c/0x388 [mrdump]
> > [ 1100.812259][ C3] ipanic_die+0x20/0x34 [mrdump]
> > [ 1100.812269][ C3] notifier_call_chain+0x90/0x174
> > [ 1100.812275][ C3] notify_die+0x50/0x8c
> > [ 1100.812279][ C3] die+0x94/0x308
> > [ 1100.812283][ C3] __do_kernel_fault+0x240/0x26c
> > [ 1100.812288][ C3] do_page_fault+0xa0/0x48c
> > [ 1100.812293][ C3] do_translation_fault+0x38/0x54
> > [ 1100.812297][ C3] do_mem_abort+0x58/0x104
> > [ 1100.812302][ C3] el1_abort+0x3c/0x5c
> > [ 1100.812307][ C3] el1h_64_sync_handler+0x54/0x90
> > [ 1100.812313][ C3] el1h_64_sync+0x68/0x6c
> > [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4
> > [ 1100.812322][ C3] udp4_ufo_fragment+0x130/0x174
> > [ 1100.812326][ C3] inet_gso_segment+0x164/0x330
> > [ 1100.812330][ C3] skb_mac_gso_segment+0xc4/0x13c
> > [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120
> > [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134
> > [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114
> > [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac
> > [ 1100.812353][ C3] __udp4_lib_rcv+0x3e0/0x818
> > [ 1100.812358][ C3] udp_rcv+0x20/0x30
> > [ 1100.812362][ C3] ip_protocol_deliver_rcu+0x194/0x368
> > [ 1100.812368][ C3] ip_local_deliver+0xe4/0x184
> > [ 1100.812373][ C3] ip_rcv+0x90/0x118
> > [ 1100.812378][ C3] __netif_receive_skb+0x74/0x124
> > [ 1100.812383][ C3] process_backlog+0xd8/0x18c
> > [ 1100.812388][ C3] __napi_poll+0x5c/0x1fc
> > [ 1100.812392][ C3] net_rx_action+0x150/0x334
> > [ 1100.812397][ C3] __do_softirq+0x120/0x3f4
> > [ 1100.812401][ C3] ____do_softirq+0x10/0x20
> > [ 1100.812405][ C3] call_on_irq_stack+0x3c/0x74
> > [ 1100.812410][ C3] do_softirq_own_stack+0x1c/0x2c
> > [ 1100.812414][ C3] __irq_exit_rcu+0x5c/0xd4
> > [ 1100.812418][ C3] irq_exit_rcu+0x10/0x1c
> > [ 1100.812422][ C3] el1_interrupt+0x38/0x58
> > [ 1100.812428][ C3] el1h_64_irq_handler+0x18/0x24
> > [ 1100.812434][ C3] el1h_64_irq+0x68/0x6c
> > [ 1100.812437][ C3] arch_local_irq_enable+0x4/0x8
> > [ 1100.812443][ C3] cpuidle_enter+0x38/0x54
> > [ 1100.812449][ C3] do_idle+0x198/0x294
> > [ 1100.812454][ C3] cpu_startup_entry+0x34/0x3c
> > [ 1100.812459][ C3] secondary_start_kernel+0x138/0x158
> > [ 1100.812465][ C3] __secondary_switched+0xc0/0xc4
> >
> > > > This packet's frag list is null while gso_type is not 0. Then
> it is
> > > treated
> > > > as a GRO-ed packet and sent to segment frag list. Function call
> > > path is
> > > > udp_rcv_segment => config features value
> > > > __udpv4_gso_segment => skb_gso_ok returns false. Here it
> > > should be
> > > > true.
> > >
> > > Why? If I read correctly the above, this is GSO packet landing in
> an
> > > UDP socket with no UDP_GRO sockopt. The packet is expected to be
> > > segmented again.
> > >
> > Yes, it is GSO packet, however the fragment list of this GSO packet
> > becomes NULL. As the occurrence rate is very low, we really don’t
> know
> > why and when it becomes to be NULL. It happens both in cellular and
> > wlan network and seems an unknown kernel issue.
> >
> > To avoid crash the packet should skip to be segmented when fraglist
> is
> > null.
> >
> > > >Failed reason is features doesn't
> > > match
> > > > gso_type.
> > > > __udp_gso_segment_list
> > > > skb_segment_list => packet is linear with skb->next
> =
> > > NULL
> > > > __udpv4_gso_segment_list_csum => use skb->next
> directly
> > > and
> > > > crash happens
> > > >
> > > > In rx-gro-list GRO-ed packet is set gso type as
> > > > NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete.
> In
> > > gso
> > > > flow the features should also set them to match with gso_type.
> Or
> > > else it
> > > > will always return false in skb_gso_ok. Then it can't discover
> the
> > > > untrusted source packet and result crash in following function.
> > >
> > > What is the 'untrusted source' here? I read the above as the
> packet
> > > aggregation happened in the GRO engine???
> > >
> > > Could you please give a complete description of the relevant
> > > scenario?
> > >
> >
> > According to the backtrace info, we infer it is a rx-frag_list GRO
>
> It would be helpful to see an skb_dump. But if this happens rarely in
> production, understood if that is not feasible.
>
> The packet arrives on process_backlog, so still not sure how it is
> produced.
>
Yes, it rarely happens. It is very hard to debug it and not sure its
produced path.

> > packet. Before sending into the UDP socket with no UDP_GRO sockopt,
> it
> > seems enter "skb_condense" to trim it and loose his frag list.
> However
> > it still keeps gso_type and gso_size. Then it continues to do
> > skb_segment_list.
> >
> > First crash happens in skb_segment_list.
> > This patch resolves the crash and lets the packet becomes a skb
> without
> > skb->next:
> > https://lore.kernel.org/all/Y9gt5EUizK1UImEP@debian/
> > Then crash moves to __udp_gso_sement_list ->
> skb_segment_list(finish)
> > -> __udpv4_gso_segment_list_csum, it uses skb->next without check
> then
> > crash.
> >
> >
> > What we want to do is to drop this abnormal packet.
>
> I think we want to deliver this packet if possible.
>
> Thanks for the added context. So this is assumed to be a GSO skb with
> SKB_GSO_FRAGLIST that somewhere lots its fraglist? That is the bug
> if true.
>
> You are suggesting that this happens in the skb_condense in
> __udp_enqueue_schedule_skb?
>
We try to add a skb_condense directly before skb_segment_list then get
a similar KE backtrace and skb dump value same with this issue skb
dump.
However we don't know which condition trigger the flow runinto
skb_condense.

> If generated by GRO then on a device that has NETIF_F_GRO_FRAGLIST
> set.
> So one workaround (not fix) is to disable that.
>
As we met other issue previously with GRO in
skb_segment(disalbe NETIF_F_GRO_FRAGLIST flow), it is still not safe to
disable GRO fraglist.

We hope current patch could be applied to drop the invalid packet.

> > So we set features
> > NETIF_F_GSO_UDP_L4 |NETIF_F_GSO_FRAGLIST to match fixes:
> f2696099c6c6
> > condation then drop it.