Re: [PATCH 2/2] x86: Rewrite ret_from_fork() in C

From: Brian Gerst
Date: Thu Jun 22 2023 - 12:04:24 EST


On Thu, Jun 22, 2023 at 9:29 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Thu, Jun 22, 2023 at 08:07:50AM -0400, Brian Gerst wrote:
> > When kCFI is enabled, special handling is needed for the indirect call
> > to the kernel thread function. Rewrite the ret_from_fork() function in
> > C so that the compiler can properly handle the indirect call.
> >
> > Suggested-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > Signed-off-by: Brian Gerst <brgerst@xxxxxxxxx>
>
> This is much nicer indeed. I'll take these patches into my series and
> repost later today if you don't mind.

Yes, that's fine.

> One little niggle below..
>
> > ---
>
> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index f31e286c2977..5ee32e7e29e8 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -284,36 +284,21 @@ SYM_FUNC_END(__switch_to_asm)
> > * r12: kernel thread arg
> > */
> > .pushsection .text, "ax"
> > +SYM_CODE_START(ret_from_fork_asm)
> > UNWIND_HINT_END_OF_STACK
> > ANNOTATE_NOENDBR // copy_thread
> > CALL_DEPTH_ACCOUNT
> >
> > + /* return address for the stack unwinder */
> > + pushq $swapgs_restore_regs_and_return_to_usermode
> > + UNWIND_HINT_FUNC
> >
> > + movq %rax, %rdi /* prev */
> > + movq %rsp, %rsi /* regs */
> > + movq %rbx, %rdx /* fn */
> > + movq %r12, %rcx /* fn_arg */
> > + jmp ret_from_fork
> > +SYM_CODE_END(ret_from_fork_asm)
> > .popsection
> >
> > .macro DEBUG_ENTRY_ASSERT_IRQS_OFF
>
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index dac41a0072ea..f5dbfebac076 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -28,6 +28,7 @@
> > #include <linux/static_call.h>
> > #include <trace/events/power.h>
> > #include <linux/hw_breakpoint.h>
> > +#include <linux/entry-common.h>
> > #include <asm/cpu.h>
> > #include <asm/apic.h>
> > #include <linux/uaccess.h>
> > @@ -134,6 +135,25 @@ static int set_new_tls(struct task_struct *p, unsigned long tls)
> > return do_set_thread_area_64(p, ARCH_SET_FS, tls);
> > }
> >
> > +__visible noinstr void ret_from_fork(struct task_struct *prev, struct pt_regs *regs,
> > + int (*fn)(void *), void *fn_arg)
>
> So I had noinstr in my initial patch, but it leads to objtool
> complaints. I suppose we can actually handle tracing and all the other
> gunk at this point, so I've removed it.

I'm not an expert on noinstr usage, but looking at the other syscall
functions, instrumentation needs to be disabled before
syscall_exit_to_user_mode() is called. Perhaps adding an
instrumentation_begin()/instrumentation_end() pair to this function is
needed?

Brian Gerst