Re: [PATCH v4 00/16] x86-64: Stack protector and percpu improvements

From: Ard Biesheuvel
Date: Mon Mar 25 2024 - 12:54:09 EST


On Sun, 24 Mar 2024 at 00:55, Arnd Bergmann <arnd@xxxxxxxx> wrote:
>
> On Sat, Mar 23, 2024, at 17:16, Linus Torvalds wrote:
> >
..
> > So there doesn't seem to be a major reason to up the versioning, since
> > the stack protector thing can just be disabled for older versions.
> >
> > But maybe even enterprise distros have upgraded anyway, and we should
> > be proactive.
> >
> > Cc'ing Arnd, who has historically been one of the people pushing this.
> > He may no longer care because we haven't had huge issues.
>
> I'm not aware of any major issues, but it keeps coming up and
> a number of people have asked me about it because of smaller
> ones. Unfortunately I didn't write down what the problems
> were.
>
> I think based on what compiler versions are shipped
> by LTS distros, gcc-8.1 is the next logical step when we
> do it, covering Debian 10, RHEL 8 and Ubuntu 20.04, which
> are probably the oldest we still need to support.
>
> RHEL 7 and SLES 12 are still technically supported distros
> as well, but those shipped with gcc-4.8, so we dropped them
> in 2020 with the move to gcc-4.9.
>
> So in short, not a lot to gain or lose from raising the
> minimum to 8.1, but it would be nice to reduce the possible
> test matrix from 10 gcc versions back to the 7 we had in
> the past, as well as clean up a few version dependencies.
> Similarly we can probably raise the oldest binutils version
> to 2.30, as that seems to be the earliest version that was
> paired with gcc-8 (in RHEL-8).
>

x86_64/SMP uses a pile of hacks to create a runtime relocatable
kernel, one of which is a workaround for the offset based addressing
of per-CPU variables. This requires RIP-relative per-CPU references,
e.g.,

leal %gs:foo(%rip), %reg

to be fixed up in the opposite direction (displacement subtracted
rather than added) in the decompressor. This scheme is used because
older GCCs can only access the stack protector cookie via a fixed
offset of GS+40, and so GS must carry the address of the start of the
per-CPU region rather than an arbitrary relative offset between the
per-CPU region in vmlinux and the one belonging to a CPU.

GCC 8.1 and later allow the cookie to be specified using a symbol, and
this would allow us to revert to the ordinary per-CPU addressing,
where the base is the vmlinux copy of a symbol, and each CPU carries a
different offset in GS that produces the address of its respective
private copy. [0]

With that out of the way, we could get rid of the weird relocation
format and just use the linker to link vmlinux in PIE mode (like other
architectures), using the condensed RELR format which only takes a
fraction of the space. Using PIC codegen and PIE linking also brings
us closer to what toolchains expect, and so fewer quirks/surprises
when moving to newer versions. (Currently on x86, we do a position
dependent link of vmlinux, and rely on the static relocations produced
by --emit-relocs to create the metadata we need to perform the
relocation fixups. Static relocations cannot describe all the
transformations and relaxations that the linker might apply, and so
static relocations might go out of sync with the actual code.)

Another minor cleanup is __GCC_ASM_FLAG_OUTPUTS__, which would always
be supported if we require 8.1 or newer.

None of this is high priority, though, so not a reason in itself to
deprecate GCC 7 and older, more of a nice bonus when we do get there.

--
Ard.



[0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=x86-remove-absolute-percpu