Re: [Intel-wired-lan] bug with rx-udp-gro-forwarding offloading?

From: Paolo Abeni
Date: Wed Jul 05 2023 - 06:29:40 EST


On Tue, 2023-07-04 at 16:27 +0200, Ian Kumlien wrote:
> More stacktraces.. =)
>
> cat bug.txt | ./scripts/decode_stacktrace.sh vmlinux
> [ 411.413767] ------------[ cut here ]------------
> [ 411.413792] WARNING: CPU: 9 PID: 942 at include/net/ud p.h:509
> udpv6_queue_rcv_skb (./include/net/udp.h:509 net/ipv6/udp.c:800
> net/ipv6/udp.c:787)

I'm really running out of ideas here...

This is:

WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);

sort of hint skb being shared (skb->users > 1) while enqueued in
multiple places (bridge local input and br forward/flood to tun
device). I audited the bridge mc flooding code, and I could not find
how a shared skb could land into the local input path.

Anyway the other splats reported here and in later emails are
compatible with shared skbs.

The above leads to another bunch of questions:
* can you reproduce the issue after disabling 'rx-gro-list' on the
ingress device? (while keeping 'rx-udp-gro-forwarding' on).
* do you have by chance qdiscs on top of the VM tun devices?

The last patch I shared was buggy, as it attempts to unclone the skb
after already touching skb_shared_info.

Could you please replace such patch with the following?

Thanks!

Paolo
---
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6c5915efbc17..0b0f4309506d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4261,6 +4261,17 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,

skb_push(skb, -skb_network_offset(skb) + offset);

+ if (WARN_ON_ONCE(skb_shared(skb))) {
+ skb = skb_share_check(skb, GFP_ATOMIC);
+ if (!skb)
+ goto err_linearize;
+ }
+
+ /* later code will clear the gso area in the shared info */
+ err = skb_unclone(skb, GFP_ATOMIC);
+ if (err)
+ goto err_linearize;
+
skb_shinfo(skb)->frag_list = NULL;

while (list_skb) {