Re: [PATCH v3 1/1] gro: decrease size of CB

From: Gal Pressman
Date: Sun Jul 02 2023 - 10:41:44 EST


On 30/06/2023 18:39, Richard Gobert wrote:
> I haven't been able to reproduce it yet, I tried two different setups:
> - 2 VMs running locally on my PC, and a geneve interface for each. Over
> these geneve interfaces, I sent tcp traffic with a similar iperf
> command as yours.
> - A geneve tunnel over veth peers inside two separate namespaces as
> David suggested.
>
> The throughput looked fine and identical with and without my patch in both
> setups.
>
> Although I did validate it while working on the patch, a problem may arise
> from:
> - Packing CB members into a union, which could've led to some sort of
> corruption.
> - Calling `gro_pull_from_frag0` on the current skb before inserting it
> into `gro_list`.
>
> Could I ask you to run some tests:
> - Running the script I attached here on one machine and checking whether
> it reproduces the problem.
> - Reverting part of my commit:
> - Reverting the change to CB struct while keeping the changes to
> `gro_pull_from_frag0`.
> - Checking whether the regression remains.
>
> Also, could you give me some more details:
> - The VMs' NIC and driver. Are you using Qemu?
> - iperf results.
> - The exact kernel versions (commit hashes) you are using.
> - Did you run the commands (sysctl/ethtool) on the receiving VM?
>
>
> Here are the commands I used for the namespaces test's setup:
> ```
> ip netns add ns1
>
> ip link add veth0 type veth peer name veth1
> ip link set veth1 netns ns1
>
> ip a add 192.168.1.1/32 dev veth0
> ip link set veth0 up
> ip r add 192.168.1.0/24 dev veth0
>
> ip netns exec ns1 ip a add 192.168.1.2/32 dev veth1
> ip netns exec ns1 ip link set veth1 up
> ip netns exec ns1 ip r add 192.168.1.0/24 dev veth1
>
> ip link add name gnv0 type geneve id 1000 remote 192.168.1.2
> ip a add 10.0.0.1/32 dev gnv0
> ip link set gnv0 up
> ip r add 10.0.1.1/32 dev gnv0
>
> ip netns exec ns1 ip link add name gnv0 type geneve id 1000 remote 192.168.1.1
> ip netns exec ns1 ip a add 10.0.1.1/32 dev gnv0
> ip netns exec ns1 ip link set gnv0 up
> ip netns exec ns1 ip r add 10.0.0.1/32 dev gnv0
>
> ethtool -K veth0 generic-receive-offload off
> ip netns exec ns1 ethtool -K veth1 generic-receive-offload off
>
> # quick way to enable gro on veth devices
> ethtool -K veth0 tcp-segmentation-offload off
> ip netns exec ns1 ethtool -K veth1 tcp-segmentation-offload off
> ```
>
> I'll continue looking into it on Monday. It would be great if someone from
> your team can write a test that reproduces this issue.
>
> Thanks.

Hey,

I don't have an answer for all of your questions yet, but it turns out I
left out an important detail, the issue reproduces when outer ipv6 is used.

I'm using ConnectX-6 Dx, with these scripts:

Server:
ip addr add 194.236.5.246/16 dev eth2
ip addr add ::12:236:5:246/96 dev eth2
ip link set dev eth2 up

ip link add p1_g464 type geneve id 464 remote ::12:236:4:245
ip link set dev p1_g464 up
ip addr add 196.236.5.1/16 dev p1_g464

Client:
ip addr add 194.236.4.245/16 dev eth2
ip addr add ::12:236:4:245/96 dev eth2
ip link set dev eth2 up

ip link add p0_g464 type geneve id 464 remote ::12:236:5:246
ip link set dev p0_g464 up
ip addr add 196.236.4.2/16 dev p0_g464

Once everything is set up, iperf -s on the server and
iperf -c 196.236.5.1 -i1 -t1000
On the client, should do the work.

Unfortunately, I haven't been able to reproduce the same issue with veth
interfaces.

Reverting the napi_gro_cb part indeed resolves the issue.

Thanks for taking a look!