[RFC PATCH 0/4] x86/percpu: Use segment qualifiers

From: Uros Bizjak
Date: Sun Oct 01 2023 - 09:16:52 EST


This patchset resurrect the work of Richard Henderson [1] and Nadav Amit [2]
to introduce named address spaces compiler extension [3,4] into the linux kernel.

On the x86 target, variables may be declared as being relative to the %fs or %gs segments.

__seg_fs
__seg_gs

The object is accessed with the respective segment override prefix.

The following patchset takes a bit more cautious approach and converts only moves,
currently implemented as an asm, to generic moves to/from named address space. The
compiler is then able to propagate memory arguments into instructions that use these
memory references, producing more compact assembly, in addition to avoiding using
a register as a temporary to hold value from the memory.

The patchset enables propagation of hundreds of memory arguments, resulting in
the cumulative code size reduction of 7.94kB (please note that the kernel is
compiled with -O2, so the code size is not entirely correct measure; some
parts of the code can now be duplicated for better performance due to -O2,
etc...).

Some examples of propagations:

a) into sign/zero extensions:

110b54: 65 0f b6 05 00 00 00 movzbl %gs:0x0(%rip),%eax
11ab90: 65 0f b6 15 00 00 00 movzbl %gs:0x0(%rip),%edx
14484a: 65 0f b7 35 00 00 00 movzwl %gs:0x0(%rip),%esi
1a08a9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax
1a08f9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax

4ab29a: 65 48 63 15 00 00 00 movslq %gs:0x0(%rip),%rdx
4be128: 65 4c 63 25 00 00 00 movslq %gs:0x0(%rip),%r12
547468: 65 48 63 1f movslq %gs:(%rdi),%rbx
5474e7: 65 48 63 0a movslq %gs:(%rdx),%rcx
54d05d: 65 48 63 0d 00 00 00 movslq %gs:0x0(%rip),%rcx

b) into compares:

b40804: 65 f7 05 00 00 00 00 testl $0xf0000,%gs:0x0(%rip)
b487e8: 65 f7 05 00 00 00 00 testl $0xf0000,%gs:0x0(%rip)
b6f14c: 65 f6 05 00 00 00 00 testb $0x1,%gs:0x0(%rip)
bac1b8: 65 f6 05 00 00 00 00 testb $0x1,%gs:0x0(%rip)
df2244: 65 f7 05 00 00 00 00 testl $0xff00,%gs:0x0(%rip)

9a7517: 65 80 3d 00 00 00 00 cmpb $0x0,%gs:0x0(%rip)
b282ba: 65 44 3b 35 00 00 00 cmp %gs:0x0(%rip),%r14d
b48f61: 65 66 83 3d 00 00 00 cmpw $0x8,%gs:0x0(%rip)
b493fe: 65 80 38 00 cmpb $0x0,%gs:(%rax)
b73867: 65 66 83 3d 00 00 00 cmpw $0x8,%gs:0x0(%rip)

c) into other insns:

65ec02: 65 0f 44 15 00 00 00 cmove %gs:0x0(%rip),%edx
6c98ac: 65 0f 44 15 00 00 00 cmove %gs:0x0(%rip),%edx
9aafaf: 65 0f 44 15 00 00 00 cmove %gs:0x0(%rip),%edx
b45868: 65 0f 48 35 00 00 00 cmovs %gs:0x0(%rip),%esi
d276f8: 65 0f 44 15 00 00 00 cmove %gs:0x0(%rip),%edx

The above propagations result in the following code size
improvements for current mainline kernel (with the default config),
compiled with

gcc (GCC) 12.3.1 20230508 (Red Hat 12.3.1-1)

text data bss dec hex filename
25508862 4386540 808388 30703790 1d480ae vmlinux-vanilla.o
25500922 4386532 808388 30695842 1d461a2 vmlinux-new.o

The conversion of other read-modify-write instructions does not bring us any
benefits, the compiler has some problems when constructing RMW instructions
from the generic code and easily misses some opportunities.

There are other optimizations possible involving arch_raw_cpu_ptr and
aggressive caching of current that are implemented in the original
patch series. These can be implemented as follow-ups at some later
time.

The patcshet was tested on Fedora 38 with kernel 6.5.5 and gcc 13.2.1
(In fact, I'm writing this message on the patched kernel.)

[1] https://lore.kernel.org/lkml/1454483253-11246-1-git-send-email-rth@xxxxxxxxxxx/
[2] https://lore.kernel.org/lkml/20190823224424.15296-1-namit@xxxxxxxxxx/
[3] https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
[4] https://clang.llvm.org/docs/LanguageExtensions.html#target-specific-extensions

Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Nadav Amit <namit@xxxxxxxxxx>
Cc: Brian Gerst <brgerst@xxxxxxxxx>
Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>

Uros Bizjak (4):
x86/percpu: Update arch/x86/include/asm/percpu.h to the current tip
x86/percpu: Detect compiler support for named address spaces
x86/percpu: Use compiler segment prefix qualifier
x86/percpu: Use C for percpu read/write accessors

arch/x86/Kconfig | 7 +
arch/x86/include/asm/percpu.h | 237 ++++++++++++++++++++++++++++-----
arch/x86/include/asm/preempt.h | 2 +-
3 files changed, 209 insertions(+), 37 deletions(-)

--
2.41.0