Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression

From: Jakub Jelinek
Date: Fri Nov 17 2023 - 07:10:13 EST


On Fri, Nov 17, 2023 at 12:44:21PM +0100, Borislav Petkov wrote:
> Might as well Cc toolchains...
>
> On Thu, Nov 16, 2023 at 11:48:18AM -0500, Linus Torvalds wrote:
> > Hmm. I know about the '-mstringop-strategy' flag because of the fairly
> > recently discussed bug where gcc would create a byte-by-byte copy in
> > some crazy circumstances with the address space attributes:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111657
>
> I hear those stringop strategy heuristics are interesting. :)
>
> > But I incorrectly thought that "-mstringop-strategy=libcall" would
> > then *always* do library calls.
>
> That's how I understood it too. BUT, reportedly, small and known sizes
> are still optimized, which is exactly what we want.

Sure. -mstringop-strategy affects only x86 expansion of the stringops
from GIMPLE to RTL, while for small constant sizes some folding can happen
far earlier in generic code. Similarly, the copy/store by pieces generic
handling (straight-line code expansion of the builtins) is done in some
cases without invoking the backend optabs which is the only expansion
affected by the strategy.
Note, the default strategy depends on the sizes, -mtune= in effect,
whether it is -Os or -O2 etc. And the argument for -mmemcpy-strategy=
or -mmemset-strategy= can include details on what sizes should be handled
by which algorithm, not everything needs to be done the same.

> > IOW, my assumption was just broken, and using
> > "-mstringop-strategy=libcall" may well be the right thing to do.
>
> And here's where I'm wondering whether we should enable it for x86 only
> or globally. I think globally because those stringop heuristics happen,
> AFAIU, in the general optimization stage and thus target agnostic.

-mstringop-strategy= option is x86 specific, so I don't see how you could
enable it on other architectures.

Anyway, if you are just trying to work-around bugs in specific compilers,
please limit it to the affected compilers, overriding kernel makefiles
forever with the workaround would mean you force perhaps suboptimal
expansion in various cases.

Jakub