[tip: x86/percpu] x86/percpu: Rewrite arch_raw_cpu_ptr() to be easier for compilers to optimize

From: tip-bot2 for Uros Bizjak
Date: Mon Oct 16 2023 - 07:09:20 EST


The following commit has been merged into the x86/percpu branch of tip:

Commit-ID: a048d3abae7c33f0a3f4575fab15ac5504d443f7
Gitweb: https://git.kernel.org/tip/a048d3abae7c33f0a3f4575fab15ac5504d443f7
Author: Uros Bizjak <ubizjak@xxxxxxxxx>
AuthorDate: Sun, 15 Oct 2023 22:24:39 +02:00
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitterDate: Mon, 16 Oct 2023 12:51:58 +02:00

x86/percpu: Rewrite arch_raw_cpu_ptr() to be easier for compilers to optimize

Implement arch_raw_cpu_ptr() as a load from this_cpu_off and then
add the ptr value to the base. This way, the compiler can propagate
addend to the following instruction and simplify address calculation.

E.g.: address calcuation in amd_pmu_enable_virt() improves from:

48 c7 c0 00 00 00 00 mov $0x0,%rax
87b7: R_X86_64_32S cpu_hw_events

65 48 03 05 00 00 00 add %gs:0x0(%rip),%rax
00
87bf: R_X86_64_PC32 this_cpu_off-0x4

48 c7 80 28 13 00 00 movq $0x0,0x1328(%rax)
00 00 00 00

to:

65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax
00
8798: R_X86_64_PC32 this_cpu_off-0x4
48 c7 80 00 00 00 00 movq $0x0,0x0(%rax)
00 00 00 00
87a6: R_X86_64_32S cpu_hw_events+0x1328

The compiler also eliminates additional redundant loads from
this_cpu_off, reducing the number of percpu offset reads
from 1668 to 1646 on a test build, a -1.3% reduction.

Signed-off-by: Uros Bizjak <ubizjak@xxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Brian Gerst <brgerst@xxxxxxxxx>
Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
Cc: Uros Bizjak <ubizjak@xxxxxxxxx>
Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
Link: https://lore.kernel.org/r/20231015202523.189168-1-ubizjak@xxxxxxxxx
---
arch/x86/include/asm/percpu.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 60ea775..915675f 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -56,9 +56,11 @@
#define arch_raw_cpu_ptr(ptr) \
({ \
unsigned long tcp_ptr__; \
- asm ("add " __percpu_arg(1) ", %0" \
+ asm ("mov " __percpu_arg(1) ", %0" \
: "=r" (tcp_ptr__) \
- : "m" (__my_cpu_var(this_cpu_off)), "0" (ptr)); \
+ : "m" (__my_cpu_var(this_cpu_off))); \
+ \
+ tcp_ptr__ += (unsigned long)(ptr); \
(typeof(*(ptr)) __kernel __force *)tcp_ptr__; \
})
#else /* CONFIG_SMP */