Re: [PATCH RFC 31/43] x86/modules: Adapt module loading for PIE support

From: Ard Biesheuvel
Date: Fri Apr 28 2023 - 15:29:51 EST


On Fri, 28 Apr 2023 at 10:53, Hou Wenlong <houwenlong.hwl@xxxxxxxxxxxx> wrote:
>
> Adapt module loading to support PIE relocations. No GOT is generared for
> module, all the GOT entry of got references in module should exist in
> kernel GOT. Currently, there is only one usable got reference for
> __fentry__().
>

I don't think this is the right approach. We should permit GOTPCREL
relocations properly, which means making them point to a location in
memory that carries the absolute address of the symbol. There are
several ways to go about that, but perhaps the simplest way is to make
the symbol address in ksymtab a 64-bit absolute value (but retain the
PC32 references for the symbol name and the symbol namespace name).
That way, you can always resolve such GOTPCREL relocations by pointing
it to the ksymtab entry. Another option would be to take inspiration
from the PLT code we have on ARM and arm64 (and other architectures,
surely) and to count the GOT based relocations, allocate some extra
r/o module space for each, and allocate slots and populate them with
the right value as you fix up the relocations.

Then, many such relocations can be relaxed at module load time if the
symbol is in range. IIUC, the module and kernel will still be inside
the same 2G window even after widening the KASLR range to 512G, so
most GOT loads can be converted into RIP relative LEA instructions.

Note that this will also permit you to do things like

#define PV_VCPU_PREEMPTED_ASM \
"leaq __per_cpu_offset(%rip), %rax \n\t" \
"movq (%rax,%rdi,8), %rax \n\t" \
"addq steal_time@GOTPCREL(%rip), %rax \n\t" \
"cmpb $0, " __stringify(KVM_STEAL_TIME_preempted) "(%rax) \n\t" \
"setne %al\n\t"

or

+#ifdef CONFIG_X86_PIE
+ " pushq arch_rethook_trampoline@GOTPCREL(%rip)\n"
+#else
" pushq $arch_rethook_trampoline\n"
+#endif

instead of having these kludgy push/pop sequences to free up temp registers.

(FYI I have looked into this PIE linking just a few weeks ago [0] so
this is all rather fresh in my memory)




[0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=x86-pie


> Signed-off-by: Hou Wenlong <houwenlong.hwl@xxxxxxxxxxxx>
> Cc: Thomas Garnier <thgarnie@xxxxxxxxxxxx>
> Cc: Lai Jiangshan <jiangshan.ljs@xxxxxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> ---
> arch/x86/include/asm/sections.h | 5 +++++
> arch/x86/kernel/module.c | 27 +++++++++++++++++++++++++++
> 2 files changed, 32 insertions(+)
>
> diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
> index a6e8373a5170..dc1c2b08ec48 100644
> --- a/arch/x86/include/asm/sections.h
> +++ b/arch/x86/include/asm/sections.h
> @@ -12,6 +12,11 @@ extern char __end_rodata_aligned[];
>
> #if defined(CONFIG_X86_64)
> extern char __end_rodata_hpage_align[];
> +
> +#ifdef CONFIG_X86_PIE
> +extern char __start_got[], __end_got[];
> +#endif
> +
> #endif
>
> extern char __end_of_kernel_reserve[];
> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> index 84ad0e61ba6e..051f88e6884e 100644
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -129,6 +129,18 @@ int apply_relocate(Elf32_Shdr *sechdrs,
> return 0;
> }
> #else /*X86_64*/
> +#ifdef CONFIG_X86_PIE
> +static u64 find_got_kernel_entry(Elf64_Sym *sym, const Elf64_Rela *rela)
> +{
> + u64 *pos;
> +
> + for (pos = (u64 *)__start_got; pos < (u64 *)__end_got; pos++)
> + if (*pos == sym->st_value)
> + return (u64)pos + rela->r_addend;
> + return 0;
> +}
> +#endif
> +
> static int __write_relocate_add(Elf64_Shdr *sechdrs,
> const char *strtab,
> unsigned int symindex,
> @@ -171,6 +183,7 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs,
> case R_X86_64_64:
> size = 8;
> break;
> +#ifndef CONFIG_X86_PIE
> case R_X86_64_32:
> if (val != *(u32 *)&val)
> goto overflow;
> @@ -181,6 +194,13 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs,
> goto overflow;
> size = 4;
> break;
> +#else
> + case R_X86_64_GOTPCREL:
> + val = find_got_kernel_entry(sym, rel);
> + if (!val)
> + goto unexpected_got_reference;
> + fallthrough;
> +#endif
> case R_X86_64_PC32:
> case R_X86_64_PLT32:
> val -= (u64)loc;
> @@ -214,11 +234,18 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs,
> }
> return 0;
>
> +#ifdef CONFIG_X86_PIE
> +unexpected_got_reference:
> + pr_err("Target got entry doesn't exist in kernel got, loc %p\n", loc);
> + return -ENOEXEC;
> +#else
> overflow:
> pr_err("overflow in relocation type %d val %Lx\n",
> (int)ELF64_R_TYPE(rel[i].r_info), val);
> pr_err("`%s' likely not compiled with -mcmodel=kernel\n",
> me->name);
> +#endif
> +
> return -ENOEXEC;
> }
>
> --
> 2.31.1
>