RE: AW: [PATCH] amd64: Fix csum_partial_copy_generic()

From: David Laight
Date: Fri Oct 20 2023 - 16:28:09 EST


From: Al Viro
> Sent: 19 October 2023 09:06
>
> On Thu, Oct 19, 2023 at 09:39:45AM +0200, Eric Dumazet wrote:
>
> > I wonder if the csum_and_copy_...() helpers are really needed in modern days,
> > with much bigger cpu caches.

The L1 data caches aren't that big.
OTOH ethernet chips tend to do checksum offloading.
I do wonder if the extra complexity is worth it.

Which code paths is it used on?

ICMP can't possibly be common enough to make any difference.
It can't be used for TCP receive - the check has to be done earlier.
Plausibly TCP transmit - but the bytestream nature must make that hard.
It might be usable on UDP transmit - but they'll be small and
still in the L1 cache later on.
For UDP receive I guess deferring the checksum check until recv()
might save the date being loaded in the cache twice - but it doesn't
need to be done with the copy for typical short UDP.

> >
> > Maybe we could remove them and use more standard copy + standard
> > checksum over kernel buffers.
>
> FWIW, the reason we don't hit that shit all the time is that on almost
> all paths all-zeroes block of data would be rejected anyway/could not
> happen. Note that e.g. for ICMPv6 the csum includes the pseudo-header
> and there's no way for that to be all-zeroes, etc.
>
> Whatever we do long-term (and I'd really like to get that mess dealt
> with properly - fuckup is definitely mine, and I should have checked
> the users of that stuff properly back then), I don't believe that
> it's doable this late in the cycle.

What is wrong with something akin to the original suggested patch?
So that the csum fragment function can never return zero.

For x86 the fastest code might be 'xor %eax,%eax; dec %eax'
to avoid the 32bit immediate constant.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)