Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression

From: Borislav Petkov
Date: Thu Nov 16 2023 - 10:44:36 EST


On Wed, Nov 15, 2023 at 02:26:02PM -0500, Linus Torvalds wrote:
> So the real issue is that we don't want an inlined memcpy at all,
> unless it's the simple constant-sized case that has been turned into
> individual moves with no loop.
>
> Or it's a "rep movsb" with FSRM as a CPUID-based alternative, of course.

Reportedly and apparently, this pretty much addresses the issue at hand.
However, I'd still like for the compiler to handle the small length
cases by issuing plain MOVs instead of blindly doing "call memcpy".

Lemme see how it would work with your patch...


diff --git a/Makefile b/Makefile
index ede0bd241056..94d93070d54a 100644
--- a/Makefile
+++ b/Makefile
@@ -996,6 +996,8 @@ endif
# change __FILE__ to the relative path from the srctree
KBUILD_CPPFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)

+KBUILD_CFLAGS += $(call cc-option,-mstringop-strategy=libcall)
+
# include additional Makefiles when needed
include-y := scripts/Makefile.extrawarn
include-$(CONFIG_DEBUG_INFO) += scripts/Makefile.debug

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette