Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression

From: Linus Torvalds
Date: Wed Nov 15 2023 - 11:53:33 EST


On Wed, 15 Nov 2023 at 10:28, David Howells <dhowells@xxxxxxxxxx> wrote:
>
> But the outcome is a bit variable and the result spaces overlap considerably.
> I certainly don't see a 17% performance reduction. Now, this may be due to
> hardware differences. The CPU I'm using is an Intel i3-4170 - which is a few
> years old at this point.

I tried to look at the perf profile changes in the original report,
and very little of it makes sense to me.

Having looked at quite a lot of those in the past (although certainly
less than Oliver) hat's *usually* a result of a test that is unstable.

In this case, though, I think the big difference is

-11.0 perf-profile.self.cycles-pp.memcpy_orig
+14.7 perf-profile.self.cycles-pp.copy_page_from_iter_atomic

which is a bit odd. It looks like the old code used to use a regular
out-of-line memcpy (and that machine doesn't have FSRM), and the new
code for some reason does it inline.

I wonder if gcc somehow decided to inline "memcpy()" in
memcpy_from_iter() as a "rep movsb" because of other inlining changes?

[ Goes out to look ]

Yup, I think that's exactly what happened. Gcc seems to decide that it
might be a small memcpy(), and seems to do at least part of it
directly.

So I *think* this all is mainly an artifact of gcc having changed code
generation due to the code re-organization.

Linus