Re: [RFC PATCH 0/4] x86/percpu: Use segment qualifiers

From: Linus Torvalds
Date: Sun Oct 01 2023 - 16:21:52 EST


On Sun, 1 Oct 2023 at 12:53, Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
>
> Regarding x86 target specific code, the same functionality used for
> explicit address space is used internally to handle __thread
> qualifier.

Ok, that's interesting, in that __thread is certainly widely used so
it will have seen testing.

> Even *if* there are some issues with aliasing, the kernel
> is immune to them due to
>
> KBUILD_CFLAGS += -fno-strict-aliasing

It's not aliasing I'd worry about. It's correctness.

And indeed, the *very* first thing I tried shows that this is all very
very buggy in gcc.

What did I try? A simple memory copy with a structure assignment.

Try to compile this:

#include <string.h>
struct a { long arr[30]; };

__seg_fs struct a m;
void foo(struct a *dst) { *dst = m; }

using the kernel compiler options (it's the "don't use sse/avx" ones
that matter):

gcc -mno-avx -mno-sse -O2 -S t.c

and look at the end result. It's complete and utter sh*t:

foo:
xorl %eax, %eax
cmpq $240, %rax
jnb .L5
.L2:
movzbl %fs:m(%rax), %edx
movb %dl, (%rdi,%rax)
addq $1, %rax
cmpq $240, %rax
jb .L2
.L5:
ret

to the point that I can only go "WTF"?

I mean, it's not just that it does the copy one byte at a time. It
literally compares %rax to $240 just after it has cleared it. I look
at that code, and I go "a five-year old with a crayon could have done
better".

In other words, no, we're not using this thing that generates that
kind of garbage.

Somebody needs to open a bugzilla entry for this kind of code generation.

Clang isn't much better, but at least it doesn't generate bad code. It
just crashes with an internal compiler error on the above trivial
test-case:

fatal error: error in backend: cannot lower memory intrinsic in
address space 257

which at least tells the user that they can't copy memory from that
address space. But once again shows that no, this feature is not ready
for prime-time.

If literally the *first* thing I thought to test was this broken, what
else is broken in this model?

And no, the kernel doesn't currently do the above kinds of things.
That's not the point. The point was "how well is this compiler support
tested". The answer is "not at all".

Linus