RE: x86/csum: Remove unnecessary odd handling

From: David Laight
Date: Fri Jan 05 2024 - 11:12:40 EST


From: David Laight
> Sent: 05 January 2024 10:41
>
> From: Linus Torvalds
> > Sent: 05 January 2024 00:33
> >
> > On Thu, 4 Jan 2024 at 15:36, Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > Anyway, since I looked at the thing originally, and feel like I know
> > > the x86 side and understand the strange IP csum too, I just applied it
> > > directly.
> >
> > I ended up just applying my 40-byte cleanup thing too that I've been
> > keeping in my own tree since posting it (as the "Silly csum
> > improvement. Maybe" patch).
>
> Interesting, I'm pretty sure trying to get two blocks of
> 'adc' scheduled in parallel like that doesn't work.
>
> I got an adc every clock from this 'beast':
> + /*
> + * Align the byte count to a multiple of 16 then
> + * add 64 bit words to alternating registers.
> + * Finally reduce to 64 bits.
> + */
> + asm( " bt $4, %[len]\n"
> + " jnc 10f\n"
> + " add (%[buff], %[len]), %[sum_0]\n"
> + " adc 8(%[buff], %[len]), %[sum_1]\n"
> + " lea 16(%[len]), %[len]\n"
> + "10: jecxz 20f\n"
> + " adc (%[buff], %[len]), %[sum_0]\n"
> + " adc 8(%[buff], %[len]), %[sum_1]\n"
> + " lea 32(%[len]), %[len_tmp]\n"
> + " adc 16(%[buff], %[len]), %[sum_0]\n"
> + " adc 24(%[buff], %[len]), %[sum_1]\n"
> + " mov %[len_tmp], %[len]\n"
> + " jmp 10b\n"
> + "20: adc %[sum_0], %[sum]\n"
> + " adc %[sum_1], %[sum]\n"
> + " adc $0, %[sum]\n"
> + : [sum] "+&r" (sum), [sum_0] "+&r" (sum_0), [sum_1] "+&r" (sum_1),
> + [len] "+&c" (len), [len_tmp] "=&r" (len_tmp)
> + : [buff] "r" (buff)
> + : "memory" );

I've got far too many x86 checksum functions lying around.

Actually you don't need all that.
Anything recent (probably Broadwell on) will execute:
"10: jecxz 20f\n"
" adc (%[buff], %[len]), %[sum]\n"
" adc 8(%[buff], %[len]), %[sum]\n"
" lea 16(%[len]), %[len]\n"
" jmp 10b\n"
"20: adc $0, %[sum]\n"
in two clocks per iteration - 8 bytes/clock.
Since it is trivial to handle 8n+4 buffers (eg as above)
that only leaves the C code to handle the final 0-7 bytes.

> Maybe I'll sort out another patch...

Probably after the next rc1 is out.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)