Re: [PATCH 6/8] ftrace: Fix dead loop caused by direct call in ftrace selftest

From: Florent Revest
Date: Fri Feb 03 2023 - 07:35:31 EST


On Thu, Feb 2, 2023 at 8:03 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> On Wed, Feb 01, 2023 at 05:34:18PM +0100, Florent Revest wrote:
> > From: Xu Kuohai <xukuohai@xxxxxxxxxx>
> >
> > After direct call is enabled for arm64, ftrace selftest enters a
> > dead loop:
> >
> > <trace_selftest_dynamic_test_func>:
> > 00 bti c
> > 01 mov x9, x30 <trace_direct_tramp>:
> > 02 bl <trace_direct_tramp> ----------> ret
> > |
> > lr/x30 is 03, return to 03
> > |
> > 03 mov w0, #0x0 <-----------------------------|
> > | |
> > | dead loop! |
> > | |
> > 04 ret ---- lr/x30 is still 03, go back to 03 ----|
> >
> > The reason is that when the direct caller trace_direct_tramp() returns
> > to the patched function trace_selftest_dynamic_test_func(), lr is still
> > the address after the instrumented instruction in the patched function,
> > so when the patched function exits, it returns to itself!
> >
> > To fix this issue, we need to restore lr before trace_direct_tramp()
> > exits, so use a dedicated trace_direct_tramp() for arm64.
> >
> > Reported-by: Li Huafei <lihuafei1@xxxxxxxxxx>
> > Signed-off-by: Xu Kuohai <xukuohai@xxxxxxxxxx>
> > Acked-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
> > Signed-off-by: Florent Revest <revest@xxxxxxxxxxxx>
>
> Looking at this, I don't think that the existing trace_direct_tramp makes
> sense -- in general a C function doesn't follow the direct call trampoline
> calling convention, and cannot be used as a direct call trampoline.

Agreed.

> Looking further, there's a distinct latent issue on s390, where the mismatch
> between their regular calling convention and their direct call trampoline
> calling convention means that trace_direct_tramp() returns into the *caller* of
> the instrumented function, skipping that entirely (verified locally with QEMU
> and printk()s added to DYN_FTRACE_TEST_NAME() / DYN_FTRACE_TEST_NAME2()
> functions).

Good catch! I didn't dare to adventure that far into s390 :)

> I think it'd be much better to do something like the below as a preparatory
> cleanup (tested on s390 under QEMU).

Thanks, that looks great to me. I'll make it a part of the series in v2 then.
Unless it's preferred that this gets merged separately?

> Thanks,
> Mark.
>
> ---->8----
> From 3b3abc89fe014ea49282622c4312521b710d1463 Mon Sep 17 00:00:00 2001
> From: Mark Rutland <mark.rutland@xxxxxxx>
> Date: Thu, 2 Feb 2023 18:37:40 +0000
> Subject: [PATCH] ftrace: selftest: remove broken trace_direct_tramp
>
> The ftrace selftest code has a trace_direct_tramp() function which it
> uses as a direct call trampoline. This happens to work on x86, since the
> direct call's return address is in the usual place, and can be returned
> to via a RET, but in general the calling convention for direct calls is
> different from regular function calls, and requires a trampoline written
> in assembly.
>
> On s390, regular function calls place the return address in %r14, and an
> ftrace patch-site in an instrumented function places the trampoline's
> return address (which is within the instrumented function) in %r0,
> preserving the original %r14 value in-place. As a regular C function
> will return to the address in %r14, using a C function as the trampoline
> results in the trampoline returning to the caller of the instrumented
> function, skipping the body of the instrumented function.
>
> Note that the s390 issue is not detcted by the ftrace selftest code, as
> the instrumented function is trivial, and returning back into the caller
> happens to be equivalent.
>
> On arm64, regular function calls place the return address in x30, and
> an ftrace patch-site in an instrumented function saves this into r9
> and places the trampoline's return address (within the instrumented
> function) in x30. A regular C function will return to the address in
> x30, but will not restore x9 into x30. Consequently, using a C function
> as the trampoline results in returning to the trampoline's return
> address having corrupted x30, such that when the instrumented function
> returns, it will return back into itself.
>
> To avoid future issues in this area, remove the trace_direct_tramp()
> function, and require that each architecture with direct calls provides
> a stub trampoline, named ftrace_stub_direct_tramp. This can be written
> to handle the architecture's trampoline calling convention, and in
> future could be used elsewhere (e.g. in the ftrace ops sample, to
> measure the overhead of direct calls), so we may as well always build it
> in.
>
> Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Li Huafei <lihuafei1@xxxxxxxxxx>
> Cc: Xu Kuohai <xukuohai@xxxxxxxxxx>
> Cc: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
> Cc: Florent Revest <revest@xxxxxxxxxxxx>
> ---
> arch/s390/kernel/mcount.S | 5 +++++
> arch/x86/kernel/ftrace_32.S | 5 +++++
> arch/x86/kernel/ftrace_64.S | 4 ++++
> include/linux/ftrace.h | 2 ++
> kernel/trace/trace_selftest.c | 15 ++-------------
> 5 files changed, 18 insertions(+), 13 deletions(-)
>
> diff --git a/arch/s390/kernel/mcount.S b/arch/s390/kernel/mcount.S
> index 4786bfe02144..ad13a0e2c307 100644
> --- a/arch/s390/kernel/mcount.S
> +++ b/arch/s390/kernel/mcount.S
> @@ -32,6 +32,11 @@ ENTRY(ftrace_stub)
> BR_EX %r14
> ENDPROC(ftrace_stub)
>
> +SYM_CODE_START(ftrace_stub_direct_tramp)
> + lgr %r1, %r0
> + BR_EX %r1
> +SYM_CODE_END(ftrace_stub_direct_tramp)
> +
> .macro ftrace_regs_entry, allregs=0
> stg %r14,(__SF_GPRS+8*8)(%r15) # save traced function caller
>
> diff --git a/arch/x86/kernel/ftrace_32.S b/arch/x86/kernel/ftrace_32.S
> index a0ed0e4a2c0c..0d9a14528176 100644
> --- a/arch/x86/kernel/ftrace_32.S
> +++ b/arch/x86/kernel/ftrace_32.S
> @@ -163,6 +163,11 @@ SYM_INNER_LABEL(ftrace_regs_call, SYM_L_GLOBAL)
> jmp .Lftrace_ret
> SYM_CODE_END(ftrace_regs_caller)
>
> +SYM_FUNC_START(ftrace_stub_direct_tramp)
> + CALL_DEPTH_ACCOUNT
> + RET
> +SYM_FUNC_END(ftrace_stub_direct_tramp)
> +
> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> SYM_CODE_START(ftrace_graph_caller)
> pushl %eax
> diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
> index 1265ad519249..8fc77e3e039c 100644
> --- a/arch/x86/kernel/ftrace_64.S
> +++ b/arch/x86/kernel/ftrace_64.S
> @@ -307,6 +307,10 @@ SYM_INNER_LABEL(ftrace_regs_caller_end, SYM_L_GLOBAL)
> SYM_FUNC_END(ftrace_regs_caller)
> STACK_FRAME_NON_STANDARD_FP(ftrace_regs_caller)
>
> +SYM_FUNC_START(ftrace_stub_direct_tramp)
> + CALL_DEPTH_ACCOUNT
> + RET
> +SYM_FUNC_END(ftrace_stub_direct_tramp)
>
> #else /* ! CONFIG_DYNAMIC_FTRACE */
>
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 3d2156e335d7..a9836b40630e 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -412,6 +412,8 @@ int unregister_ftrace_direct(struct ftrace_ops *ops, unsigned long addr);
> int modify_ftrace_direct(struct ftrace_ops *ops, unsigned long addr);
> int modify_ftrace_direct_nolock(struct ftrace_ops *ops, unsigned long addr);
>
> +void ftrace_stub_direct_tramp(void *);
> +
> #else
> struct ftrace_ops;
> # define ftrace_direct_func_count 0
> diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
> index 06218fc9374b..e6530b7b42e4 100644
> --- a/kernel/trace/trace_selftest.c
> +++ b/kernel/trace/trace_selftest.c
> @@ -784,17 +784,6 @@ static struct fgraph_ops fgraph_ops __initdata = {
> .retfunc = &trace_graph_return,
> };
>
> -#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> -#ifndef CALL_DEPTH_ACCOUNT
> -#define CALL_DEPTH_ACCOUNT ""
> -#endif
> -
> -noinline __noclone static void trace_direct_tramp(void)
> -{
> - asm(CALL_DEPTH_ACCOUNT);
> -}
> -#endif
> -
> /*
> * Pretty much the same than for the function tracer from which the selftest
> * has been borrowed.
> @@ -875,7 +864,7 @@ trace_selftest_startup_function_graph(struct tracer *trace,
> */
> ftrace_set_filter_ip(&direct, (unsigned long)DYN_FTRACE_TEST_NAME, 0, 0);
> ret = register_ftrace_direct(&direct,
> - (unsigned long)trace_direct_tramp);
> + (unsigned long)ftrace_stub_direct_tramp);
> if (ret)
> goto out;
>
> @@ -896,7 +885,7 @@ trace_selftest_startup_function_graph(struct tracer *trace,
> unregister_ftrace_graph(&fgraph_ops);
>
> ret = unregister_ftrace_direct(&direct,
> - (unsigned long)trace_direct_tramp);
> + (unsigned long)ftrace_stub_direct_tramp);
> if (ret)
> goto out;
>
> --
> 2.30.2
>