Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression

From: Linus Torvalds
Date: Wed Nov 15 2023 - 14:26:46 EST


On Wed, 15 Nov 2023 at 14:10, Borislav Petkov <bp@xxxxxxxxx> wrote:
>
> > Borislav, see
> >
> > https://lore.kernel.org/all/CAHk-=wjCUckvZUQf7gqp2ziJUWxVpikM_6srFdbcNdBJTxExRg@xxxxxxxxxxxxxx/
> >
> > for some truly crazy code generation by gcc.
>
> Yeah, lemme show that to gcc folks. That asm is with your compiler,
> right? Version?

That was with gcc version 13.2.1.

Note that I only see that crazy thing in lib/iov_iter.s, so I really
do think it has something to do with inlining __builtin_memcpy()
behind a conditional function pointer.

In normal cases, gcc seems to just do the obvious thing (ie expand a
small constant-sized memcpy inline, or just call the external 'memcpy'
function.

So it's some odd pattern that triggers that "expand non-constant
memcpy inline". And once that happens, the odd code generation is
still a bit odd but is at least explicable.

That "do first word by hand, then do aligned 'rep movsq' on top of it"
pattern is weird, but we've seen some similar strange patterns in
hand-written memcpy (eg "use two overlapping 8-byte writes to handle
the 8-15 byte case").

So the real issue is that we don't want an inlined memcpy at all,
unless it's the simple constant-sized case that has been turned into
individual moves with no loop.

Or it's a "rep movsb" with FSRM as a CPUID-based alternative, of course.

Linus