[PATCH 0/4] x86/percpu: Use segment qualifiers

From: Uros Bizjak
Date: Wed Oct 04 2023 - 10:51:56 EST


This patchset resurrect the work of Richard Henderson [1] and Nadav
Amit [2] to introduce named address spaces compiler extension [3,4]
into the linux kernel.

On the x86 target, variables may be declared as being relative to
the %fs or %gs segments.

__seg_fs
__seg_gs

The object is accessed with the respective segment override prefix.

The following patchset takes a bit more cautious approach and converts
only moves, currently implemented as an asm, to generic moves to/from
named address space. The compiler is then able to propagate memory
arguments into instructions that use these memory references, producing
more compact assembly, in addition to avoiding using a register as a
temporary to hold value from the memory.

The patchset enables propagation of hundreds of memory arguments,
resulting in the cumulative code size reduction of 7.94kB (please note
that the kernel is compiled with -O2, so the code size is not entirely
correct measure; some parts of the code can now be duplicated for
better performance due to -O2, etc...).

Some examples of propagations:

a) into sign/zero extensions:

110b54: 65 0f b6 05 00 00 00 movzbl %gs:0x0(%rip),%eax
11ab90: 65 0f b6 15 00 00 00 movzbl %gs:0x0(%rip),%edx
14484a: 65 0f b7 35 00 00 00 movzwl %gs:0x0(%rip),%esi
1a08a9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax
1a08f9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax

4ab29a: 65 48 63 15 00 00 00 movslq %gs:0x0(%rip),%rdx
4be128: 65 4c 63 25 00 00 00 movslq %gs:0x0(%rip),%r12
547468: 65 48 63 1f movslq %gs:(%rdi),%rbx
5474e7: 65 48 63 0a movslq %gs:(%rdx),%rcx
54d05d: 65 48 63 0d 00 00 00 movslq %gs:0x0(%rip),%rcx

b) into compares:

b40804: 65 f7 05 00 00 00 00 testl $0xf0000,%gs:0x0(%rip)
b487e8: 65 f7 05 00 00 00 00 testl $0xf0000,%gs:0x0(%rip)
b6f14c: 65 f6 05 00 00 00 00 testb $0x1,%gs:0x0(%rip)
bac1b8: 65 f6 05 00 00 00 00 testb $0x1,%gs:0x0(%rip)
df2244: 65 f7 05 00 00 00 00 testl $0xff00,%gs:0x0(%rip)

9a7517: 65 80 3d 00 00 00 00 cmpb $0x0,%gs:0x0(%rip)
b282ba: 65 44 3b 35 00 00 00 cmp %gs:0x0(%rip),%r14d
b48f61: 65 66 83 3d 00 00 00 cmpw $0x8,%gs:0x0(%rip)
b493fe: 65 80 38 00 cmpb $0x0,%gs:(%rax)
b73867: 65 66 83 3d 00 00 00 cmpw $0x8,%gs:0x0(%rip)

c) into other insns:

65ec02: 65 0f 44 15 00 00 00 cmove %gs:0x0(%rip),%edx
6c98ac: 65 0f 44 15 00 00 00 cmove %gs:0x0(%rip),%edx
9aafaf: 65 0f 44 15 00 00 00 cmove %gs:0x0(%rip),%edx
b45868: 65 0f 48 35 00 00 00 cmovs %gs:0x0(%rip),%esi
d276f8: 65 0f 44 15 00 00 00 cmove %gs:0x0(%rip),%edx

The above propagations result in the following code size
improvements for current mainline kernel (with the default config),
compiled with

gcc (GCC) 12.3.1 20230508 (Red Hat 12.3.1-1)

text data bss dec hex filename
25508862 4386540 808388 30703790 1d480ae vmlinux-vanilla.o
25500922 4386532 808388 30695842 1d461a2 vmlinux-new.o

The conversion of other read-modify-write instructions does not bring
us any benefits, the compiler has some problems when constructing RMW
instructions from the generic code and easily misses some opportunities.

There are other optimizations possible involving arch_raw_cpu_ptr and
aggressive caching of current that are implemented in the original
patch series. These can be implemented as follow-ups at some later
time.

The patcshet was tested on Fedora 38 with kernel 6.5.5 and gcc 13.2.1
(In fact, I'm writing this message on the patched kernel.)

[1] https://lore.kernel.org/lkml/1454483253-11246-1-git-send-email-rth@xxxxxxxxxxx/
[2] https://lore.kernel.org/lkml/20190823224424.15296-1-namit@xxxxxxxxxx/
[3] https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
[4] https://clang.llvm.org/docs/LanguageExtensions.html#target-specific-extensions

Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Nadav Amit <namit@xxxxxxxxxx>
Cc: Brian Gerst <brgerst@xxxxxxxxx>
Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>

Uros Bizjak (4):
x86/percpu: Update arch/x86/include/asm/percpu.h to the current tip
x86/percpu: Enable named address spaces with known compiler version
x86/percpu: Use compiler segment prefix qualifier
x86/percpu: Use C for percpu read/write accessors

arch/x86/Kconfig | 7 +
arch/x86/include/asm/percpu.h | 237 ++++++++++++++++++++++++++++-----
arch/x86/include/asm/preempt.h | 2 +-
3 files changed, 209 insertions(+), 37 deletions(-)

--
2.41.0