Re: [PATCH bpf-next v2 0/4] Add ftrace direct call for arm64

From: Florent Revest
Date: Mon Oct 17 2022 - 15:10:49 EST


Uhuh, apologies for my perf report formatting! I'll try to figure it
out for next time, meanwhile you can find it better formatted here
https://paste.debian.net/1257405/

On Mon, Oct 17, 2022 at 8:49 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> On Mon, 17 Oct 2022 19:55:06 +0200
> Florent Revest <revest@xxxxxxxxxxxx> wrote:
>
> > Note that I can't really make sense of the perf report with indirect
> > calls. it always reports it spent 12% of the time in
> > rethook_trampoline_handler but I verified with both a WARN in that
> > function and a breakpoint with a debugger, this function does *not*
> > get called when running this "bench trig-fentry" benchmark. Also it
> > wouldn't make sense for fprobe_handler to call it so I'm quite
> > confused why perf would report this call and such a long time spent
> > there. Anyone know what I could be missing here ?
>
> The trace shows __bpf_prog_exit, which I'm guessing is tracing the end of
> the function. Right?

Actually no, this function is called to end the context of a BPF
program execution. Here it is called at the end of the fentry program
(so still before the traced function). I hope the pastebin helps
clarify this!

> In which case I believe it must call rethook_trampoline_handler:
>
> -> fprobe_handler() /* Which could use some "unlikely()" to move disabled
> paths out of the hot path */
>
> /* And also calls rethook_try_get () which does a cmpxchg! */
>
> -> ret_hook()
> -> arch_rethook_prepare()
> Sets regs->lr = arch_rethook_trampoline
>
> On return of the function, it jumps to arch_rethook_trampoline()
>
> -> arch_rethook_trampoline()
> -> arch_rethook_trampoline_callback()
> -> rethook_trampoline_handler()

This is indeed what happens when an fexit program is also attached.
But when running "bench trig-fentry", only an fentry program is
attached so bpf_fprobe_entry returns a non-zero value and fprobe
doesn't call rethook_hook.

Also, in this situation arch_rethook_trampoline is called by the
traced function's return but in the perf report, iiuc, it shows up as
being called from fprobe_handler and that should never happen. I
wonder if this is some sort of stack unwinding artifact during the
perf record?

> So I do not know how it wouldn't trigger the WARNING or breakpoint if you
> added it there.

By the way, the WARNING does trigger if I also attach an fexit program
(then rethook_hook is called). But I made sure we skip the whole
rethook logic if no fexit program is attached so bench trig-fentry
should not go through rethook_trampoline_handler.