Re: [PATCH 2/2] x86: Rewrite ret_from_fork() in C

From: Peter Zijlstra
Date: Thu Jun 22 2023 - 09:29:28 EST


On Thu, Jun 22, 2023 at 08:07:50AM -0400, Brian Gerst wrote:
> When kCFI is enabled, special handling is needed for the indirect call
> to the kernel thread function. Rewrite the ret_from_fork() function in
> C so that the compiler can properly handle the indirect call.
>
> Suggested-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> Signed-off-by: Brian Gerst <brgerst@xxxxxxxxx>

This is much nicer indeed. I'll take these patches into my series and
repost later today if you don't mind.

One little niggle below..

> ---

> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index f31e286c2977..5ee32e7e29e8 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -284,36 +284,21 @@ SYM_FUNC_END(__switch_to_asm)
> * r12: kernel thread arg
> */
> .pushsection .text, "ax"
> +SYM_CODE_START(ret_from_fork_asm)
> UNWIND_HINT_END_OF_STACK
> ANNOTATE_NOENDBR // copy_thread
> CALL_DEPTH_ACCOUNT
>
> + /* return address for the stack unwinder */
> + pushq $swapgs_restore_regs_and_return_to_usermode
> + UNWIND_HINT_FUNC
>
> + movq %rax, %rdi /* prev */
> + movq %rsp, %rsi /* regs */
> + movq %rbx, %rdx /* fn */
> + movq %r12, %rcx /* fn_arg */
> + jmp ret_from_fork
> +SYM_CODE_END(ret_from_fork_asm)
> .popsection
>
> .macro DEBUG_ENTRY_ASSERT_IRQS_OFF

> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index dac41a0072ea..f5dbfebac076 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -28,6 +28,7 @@
> #include <linux/static_call.h>
> #include <trace/events/power.h>
> #include <linux/hw_breakpoint.h>
> +#include <linux/entry-common.h>
> #include <asm/cpu.h>
> #include <asm/apic.h>
> #include <linux/uaccess.h>
> @@ -134,6 +135,25 @@ static int set_new_tls(struct task_struct *p, unsigned long tls)
> return do_set_thread_area_64(p, ARCH_SET_FS, tls);
> }
>
> +__visible noinstr void ret_from_fork(struct task_struct *prev, struct pt_regs *regs,
> + int (*fn)(void *), void *fn_arg)

So I had noinstr in my initial patch, but it leads to objtool
complaints. I suppose we can actually handle tracing and all the other
gunk at this point, so I've removed it.

The alternative is to use __noinstr_section(".text") if we really want
to suppress all the funnies.

> +{
> + schedule_tail(prev);
> +
> + /* Is this a kernel thread? */
> + if (unlikely(fn)) {
> + fn(fn_arg);
> + /*
> + * A kernel thread is allowed to return here after successfully
> + * calling kernel_execve(). Exit to userspace to complete the
> + * execve() syscall.
> + */
> + regs->ax = 0;
> + }
> +
> + syscall_exit_to_user_mode(regs);
> +}