Re: [PATCH net-next v4 4/4] net: gro: move L3 flush checks to tcp_gro_receive

From: Richard Gobert
Date: Wed Mar 27 2024 - 12:13:58 EST


Paolo Abeni wrote:
> On Tue, 2024-03-26 at 18:25 +0100, Richard Gobert wrote:
>> Paolo Abeni wrote:
>>> Hi,
>>>
>>> On Tue, 2024-03-26 at 16:02 +0100, Richard Gobert wrote:
>>>> This patch is meaningful by itself - removing checks against non-relevant
>>>> packets and making the flush/flush_id checks in a single place.
>>>
>>> I'm personally not sure this patch is a win. The code churn is
>>> significant. I understand this is for performance's sake, but I don't
>>> see the benefit???
>>>
>>
>> Could you clarify what do you mean by code churn?
>
> The diffstat of this patch is not negligible and touches very sensitive
> areas.
>

diff mainly touches flush/flush_id/is_atomic, the new code should be
less complex. I agree this is sensitive as it is part of core GRO -
I checked all relevant flows manually, but I can also create more
tests and ensure that logic remains the same.

>>> he changelog shows that perf reports slightly lower figures for
>>> inet_gro_receive(). That is expected, as this patch move code out of
>>> such functio. What about inet_gro_flush()/tcp_gro_receive() where such
>>> code is moved?
>>>
>>
>> Please consider the following 2 common scenarios:
>>
>> 1) Multiple packets in the GRO bucket - the common case with multiple
>> packets in the bucket (i.e. running super_netperf TCP_STREAM) - each layer
>> executes a for loop - going over each packet in the bucket. Specifically,
>> L3 gro_receive loops over the bucket making flush,flush_id,is_atomic
>> checks. 
>
> Only for packets with the same rx hash.
>

Right, but there are only 8 GRO buckets, so a collision can still happen
on multiple concurrent streams.

>> For most packets in the bucket, these checks are not
>> relevant. (possibly also dirtying cache lines with non-relevant p
>> packets). Removing code in the for loop for this case is significant.
>>
>> 2) UDP/TCP streams which do not coalesce in GRO. This is the common case
>> for regular UDP connections (i.e. running netperf UDP_STREAM). In this
>> case, GRO is just overhead. Removing any code from these layers
>> is good (shown in the first measurement of the commit message).
>
> If UDP GRO is not enabled, there are no UDP packet staging in the UDP
> gro engine, the bucket list is empty.
>
>>> Additionally the reported deltas is within noise level according to my
>>> personal experience with similar tests.
>>>
>>
>> I've tested the difference between net-next and this patch repetitively,
>> which showed stable results each time. Is there any specific test you
>> think would be helpful to show the result?
>
> Anything that show measurable gain. 
>
> Reporting the CPU utilization in the inet_gro_receive() function alone
> is not enough, as part of the load has been moved into
> gro_network_flush()/tcp_gro_receive().
>

Got it, the numbers I reported were only relevant to UDP flows (so
measuring perf top with -g flag showed the same improvement). I'll post in v5
numbers relevant to TCP as well.

Thanks