RE: [PATCH v5 00/11] iov_iter: Convert the iterator macros into inline funcs

From: David Laight
Date: Sat Sep 23 2023 - 06:31:57 EST


From: Willem de Bruijn
> Sent: 23 September 2023 07:59
>
> On Fri, Sep 22, 2023 at 2:01 PM David Howells <dhowells@xxxxxxxxxx> wrote:
> >
> > David Laight <David.Laight@xxxxxxxxxx> wrote:
> >
> > > > (8) Move the copy-and-csum code to net/ where it can be in proximity with
> > > > the code that uses it. This eliminates the code if CONFIG_NET=n and
> > > > allows for the slim possibility of it being inlined.
> > > >
> > > > (9) Fold memcpy_and_csum() in to its two users.
> > > >
> > > > (10) Move csum_and_copy_from_iter_full() out of line and merge in
> > > > csum_and_copy_from_iter() since the former is the only caller of the
> > > > latter.
> > >
> > > I thought that the real idea behind these was to do the checksum
> > > at the same time as the copy to avoid loading the data into the L1
> > > data-cache twice - especially for long buffers.
> > > I wonder how often there are multiple iov[] that actually make
> > > it better than just check summing the linear buffer?
> >
> > It also reduces the overhead for finding the data to checksum in the case the
> > packet gets split since we're doing the checksumming as we copy - but with a
> > linear buffer, that's negligible.
> >
> > > I had a feeling that check summing of udp data was done during
> > > copy_to/from_user, but the code can't be the copy-and-csum here
> > > for that because it is missing support form odd-length buffers.
> >
> > Is there a bug there?

No, I misread the code - i shouldn't scan patches when I'd
got a viral head code...

...
> > You may be right. That's more a question for the networking folks than for
> > me. It's entirely possible that the checksumming code is just not used on
> > modern systems these days.
> >
> > Maybe Willem can comment since he's the UDP maintainer?
>
> Perhaps these days it is more relevant to embedded systems than high
> end servers.

The checksum and copy are done together.
I probably missed it because the function isn't passed the
old checksum (which it can pretty much process for free).
Instead the caller is adding it afterwards - which involves
and extra explicit csum_add().

The x86-x84 ip checksum loops are all horrid though.
The unrolling in them is so 1990's.
With the out-of-order pipeline the memory accesses tend
to take care of themselves.
Not to mention that a whole raft of (now oldish) cpu take two
clocks to execute 'adc'.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)