Re: [PATCH RFC 31/43] x86/modules: Adapt module loading for PIE support

From: Hou Wenlong
Date: Mon May 08 2023 - 04:32:38 EST


On Sat, Apr 29, 2023 at 03:29:32AM +0800, Ard Biesheuvel wrote:
> On Fri, 28 Apr 2023 at 10:53, Hou Wenlong <houwenlong.hwl@xxxxxxxxxxxx> wrote:
> >
> > Adapt module loading to support PIE relocations. No GOT is generared for
> > module, all the GOT entry of got references in module should exist in
> > kernel GOT. Currently, there is only one usable got reference for
> > __fentry__().
> >
>
> I don't think this is the right approach. We should permit GOTPCREL
> relocations properly, which means making them point to a location in
> memory that carries the absolute address of the symbol. There are
> several ways to go about that, but perhaps the simplest way is to make
> the symbol address in ksymtab a 64-bit absolute value (but retain the
> PC32 references for the symbol name and the symbol namespace name).
> That way, you can always resolve such GOTPCREL relocations by pointing
> it to the ksymtab entry. Another option would be to take inspiration
> from the PLT code we have on ARM and arm64 (and other architectures,
> surely) and to count the GOT based relocations, allocate some extra
> r/o module space for each, and allocate slots and populate them with
> the right value as you fix up the relocations.
>
> Then, many such relocations can be relaxed at module load time if the
> symbol is in range. IIUC, the module and kernel will still be inside
> the same 2G window even after widening the KASLR range to 512G, so
> most GOT loads can be converted into RIP relative LEA instructions.
>
> Note that this will also permit you to do things like
>
> #define PV_VCPU_PREEMPTED_ASM \
> "leaq __per_cpu_offset(%rip), %rax \n\t" \
> "movq (%rax,%rdi,8), %rax \n\t" \
> "addq steal_time@GOTPCREL(%rip), %rax \n\t" \
> "cmpb $0, " __stringify(KVM_STEAL_TIME_preempted) "(%rax) \n\t" \
> "setne %al\n\t"
>
> or
>
> +#ifdef CONFIG_X86_PIE
> + " pushq arch_rethook_trampoline@GOTPCREL(%rip)\n"
> +#else
> " pushq $arch_rethook_trampoline\n"
> +#endif
>
> instead of having these kludgy push/pop sequences to free up temp registers.
>
> (FYI I have looked into this PIE linking just a few weeks ago [0] so
> this is all rather fresh in my memory)
>
>
>
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=x86-pie
>
>
Hi Ard,
Thanks for providing the link, it has been very helpful for me as I am
new to the topic of compilers. One key difference I noticed is that you
linked the kernel with "-pie" instead of "--emit-reloc". I also noticed
that Thomas' initial patchset[0] used "-pie", but in RFC v3 [1], it
switched to "--emit-reloc" in order to reduce dynamic relocation space
on mapped memory.

The another issue is that it requires the addition of the
"-mrelax-relocations=no" option to support older compilers and linkers.
R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX relocations are supported
in binutils 2.26 and later, but the mini version required for the kernel
is 2.25. This option disables relocation relaxation, which makes GOT not
empty. I also noticed this option in arch/x86/boot/compressed/Makefile
with the reason given in [2]. Without relocation relaxation, GOT
references would increase the size of GOT. Therefore, I do not want to
use GOT reference in assembly directly. However, I realized that the
compiler could still generate GOT references in some cases such as
"fentry" calls and stack canary references.

Regarding module loading, I agree that we should support GOT reference
for the module itself. I will refactor it according to your suggestion.

Thanks.

[0] https://yhbt.net/lore/all/20170718223333.110371-20-thgarnie@xxxxxxxxxx
[1] https://yhbt.net/lore/all/20171004212003.28296-1-thgarnie@xxxxxxxxxx
[2] https://lore.kernel.org/all/20200903203053.3411268-2-samitolvanen@xxxxxxxxxx/

> > Signed-off-by: Hou Wenlong <houwenlong.hwl@xxxxxxxxxxxx>
> > Cc: Thomas Garnier <thgarnie@xxxxxxxxxxxx>
> > Cc: Lai Jiangshan <jiangshan.ljs@xxxxxxxxxxxx>
> > Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> > ---
> > arch/x86/include/asm/sections.h | 5 +++++
> > arch/x86/kernel/module.c | 27 +++++++++++++++++++++++++++
> > 2 files changed, 32 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
> > index a6e8373a5170..dc1c2b08ec48 100644
> > --- a/arch/x86/include/asm/sections.h
> > +++ b/arch/x86/include/asm/sections.h
> > @@ -12,6 +12,11 @@ extern char __end_rodata_aligned[];
> >
> > #if defined(CONFIG_X86_64)
> > extern char __end_rodata_hpage_align[];
> > +
> > +#ifdef CONFIG_X86_PIE
> > +extern char __start_got[], __end_got[];
> > +#endif
> > +
> > #endif
> >
> > extern char __end_of_kernel_reserve[];
> > diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> > index 84ad0e61ba6e..051f88e6884e 100644
> > --- a/arch/x86/kernel/module.c
> > +++ b/arch/x86/kernel/module.c
> > @@ -129,6 +129,18 @@ int apply_relocate(Elf32_Shdr *sechdrs,
> > return 0;
> > }
> > #else /*X86_64*/
> > +#ifdef CONFIG_X86_PIE
> > +static u64 find_got_kernel_entry(Elf64_Sym *sym, const Elf64_Rela *rela)
> > +{
> > + u64 *pos;
> > +
> > + for (pos = (u64 *)__start_got; pos < (u64 *)__end_got; pos++)
> > + if (*pos == sym->st_value)
> > + return (u64)pos + rela->r_addend;
> > + return 0;
> > +}
> > +#endif
> > +
> > static int __write_relocate_add(Elf64_Shdr *sechdrs,
> > const char *strtab,
> > unsigned int symindex,
> > @@ -171,6 +183,7 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs,
> > case R_X86_64_64:
> > size = 8;
> > break;
> > +#ifndef CONFIG_X86_PIE
> > case R_X86_64_32:
> > if (val != *(u32 *)&val)
> > goto overflow;
> > @@ -181,6 +194,13 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs,
> > goto overflow;
> > size = 4;
> > break;
> > +#else
> > + case R_X86_64_GOTPCREL:
> > + val = find_got_kernel_entry(sym, rel);
> > + if (!val)
> > + goto unexpected_got_reference;
> > + fallthrough;
> > +#endif
> > case R_X86_64_PC32:
> > case R_X86_64_PLT32:
> > val -= (u64)loc;
> > @@ -214,11 +234,18 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs,
> > }
> > return 0;
> >
> > +#ifdef CONFIG_X86_PIE
> > +unexpected_got_reference:
> > + pr_err("Target got entry doesn't exist in kernel got, loc %p\n", loc);
> > + return -ENOEXEC;
> > +#else
> > overflow:
> > pr_err("overflow in relocation type %d val %Lx\n",
> > (int)ELF64_R_TYPE(rel[i].r_info), val);
> > pr_err("`%s' likely not compiled with -mcmodel=kernel\n",
> > me->name);
> > +#endif
> > +
> > return -ENOEXEC;
> > }
> >
> > --
> > 2.31.1
> >