Re: [PATCH RFC 16/43] x86-64: Use per-cpu stack canary if supported by compiler

From: Fangrui Song
Date: Fri May 05 2023 - 15:06:47 EST


On Fri, May 5, 2023 at 11:02 AM Nick Desaulniers
<ndesaulniers@xxxxxxxxxx> wrote:
>
> On Thu, May 4, 2023 at 11:14 PM Hou Wenlong <houwenlong.hwl@xxxxxxxxxxxx> wrote:
> >
> > On Tue, May 02, 2023 at 01:27:53AM +0800, Nick Desaulniers wrote:
> > > On Fri, Apr 28, 2023 at 2:52 AM Hou Wenlong <houwenlong.hwl@xxxxxxxxxxxx> wrote:
> > > >
> > > > +config CC_HAS_CUSTOMIZED_STACKPROTECTOR
> > > > + bool
> > > > + # Although clang supports -mstack-protector-guard-reg option, it
> > > > + # would generate GOT reference for __stack_chk_guard even with
> > > > + # -fno-PIE flag.
> > > > + default y if (!CC_IS_CLANG && $(cc-option,-mstack-protector-guard-reg=gs))
> > >
> > > Hi Hou,
> > > I've filed this bug against LLVM and will work with LLVM folks at
> > > Intel to resolve:
> > > https://github.com/llvm/llvm-project/issues/62481
> > > Can you please review that report and let me know here or there if I
> > > missed anything? Would you also mind including a link to that in the
> > > comments in the next version of this patch?
> > >
> > Hi Nick,
> >
> > Thanks for your help, I'll include the link in the next version.
> > Actually, I had post an issue on github too when I test the patch on
> > LLVM. But no replies. :(.
>
> Ah, sorry about that. The issue tracker is pretty high volume and
> stuff gets missed. With many users comes many bug reports. We could
> be better about triage though. If it's specific to the Linux kernel,
> https://github.com/ClangBuiltLinux/linux/issues is a better issue
> tracker to use; we can move bug reports upstream to
> https://github.com/llvm/llvm-project/issues/ when necessary. It's
> linked off of clangbuiltlinux.github.io if you lose it.
>
> > https://github.com/llvm/llvm-project/issues/60116
> >
> > There is another problem I met for this patch, some unexpected code
> > are generated:
> >
> > do_one_initcall: (init/main.o)
> > ......
> > movq __stack_chk_guard(%rip), %rax
> > movq %rax,0x2b0(%rsp)
> >
> > The complier generates wrong instruction, no GOT reference and gs
> > register. I only see it in init/main.c file. I have tried to move the
> > function into another file and compiled it with same cflags. It could
> > generate right instruction for the function in another file.
>
> The wrong instruction or the wrong operand? This is loading the
> canary into the stack slot in the fn prolog. Perhaps the expected
> cflag is not getting set (or being removed) from init/main.c? You
> should be able to do:
>
> $ make LLVM=1 init/main.o V=1
>
> to see how clang was invoked to see if the expected flag was there, or not.
>
> >
> > The LLVM chain toolsare built by myself:
> > clang version 15.0.7 (https://github.com/llvm/llvm-project.git
> > 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a)
>
> Perhaps worth rebuilding with top of tree, which is clang 17.
>
> >
> > > Less relevant issues I filed looking at some related codegen:
> > > https://github.com/llvm/llvm-project/issues/62482
> > > https://github.com/llvm/llvm-project/issues/62480
> > >
> > > And we should probably look into:
> > > https://github.com/llvm/llvm-project/issues/22476
> > >
> > >
> >
> > Except for per-cpu stack canary patch, there is another issue I post on
> > github: https://github.com/llvm/llvm-project/issues/60096
>
> Thanks, I'll bring that up with Intel, too.
>
> >
> > The related patch is:
> > https://lore.kernel.org/lkml/175116f75c38c15d8d73a03301eab805fea13a0a.1682673543.git.houwenlong.hwl@xxxxxxxxxxxx/
> >
> > I couldn't find the related documentation about that, hope you can help
> > me too.
> >
> > One more problem that I didn't post is:
> > https://lore.kernel.org/lkml/8d6bbaf66b90cf1a8fd2c5da98f5e094b9ffcb27.1682673543.git.houwenlong.hwl@xxxxxxxxxxxx/
>
> Mind filing another bug for this in llvm's issue tracker? We can
> discuss there if LLD needs to be doing something different.
>
> Thanks for uncovering these and helping us get them fixed up!
> --
> Thanks,
> ~Nick Desaulniers

In my opinion, Clang's behavior is working as intended. The Linux
kernel should support R_X86_64_REX_GOTPCRELX, and the solution is
straightforward: treat R_X86_64_REX_GOTPCRELX the same way as
R_X86_64_PC32 (-shared -Bsymbolic), assuming that every symbol is
defined, which means that every symbol is non-preemptible.

Clang's `-fno-pic` option chooses `R_X86_64_REX_GOTPCRELX` which is
correct, although it differs from GCC's `-fno-pic` option.

The compiler doesn't know whether `__stack_chk_guard` will be provided
by the main executable (`libc.a`) or a shared object (`libc.so`,
available on some ports of glibc but not x86, on musl this is
available for all ports).
(Also see `__stack_chk_guard` on
https://maskray.me/blog/2022-12-18-control-flow-integrity)

If an `R_X86_64_32` relocation is used and `__stack_chk_guard` is
defined by a shared object, copy relocation.
We will need an ELF hack called [copy
relocation](https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected).

The instruction movq __stack_chk_guard@GOTPCREL(%rip), %rbx produces
an R_X86_64_REX_GOTPCRELX relocation.
If `__stack_chk_guard` is non-preemptible, linkers can [optimize the
access to be direct](https://maskray.me/blog/2021-08-29-all-about-global-offset-table#got-optimization).

Although we could technically use the
`-fno-direct-access-external-data` option to switch between
`R_X86_64_REX_GOTPCRELX` and `R_X86_64_32`, I think there is no
justification to complicate the compiler.



--
宋方睿