Re: [PATCH -next V7 0/7] riscv: Optimize function trace

From: Guo Ren
Date: Mon Feb 06 2023 - 22:58:00 EST


On Mon, Feb 6, 2023 at 5:56 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> On Sat, Feb 04, 2023 at 02:40:52PM +0800, Guo Ren wrote:
> > On Mon, Jan 16, 2023 at 11:02 PM Evgenii Shatokhin
> > <e.shatokhin@xxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > On 12.01.2023 12:05, guoren@xxxxxxxxxx wrote:
> > > > From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
> > > >
> > > > The previous ftrace detour implementation fc76b8b8011 ("riscv: Using
> > > > PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems.
> > > >
> > > > - The most horrible bug is preemption panic which found by Andy [1].
> > > > Let's disable preemption for ftrace first, and Andy could continue
> > > > the ftrace preemption work.
> > >
> > > It seems, the patches #2-#7 of this series do not require "riscv:
> > > ftrace: Fixup panic by disabling preemption" and can be used without it.
> > >
> > > How about moving that patch out of the series and processing it separately?
> > Okay.
> >
> > >
> > > As it was pointed out in the discussion of that patch, some other
> > > solution to non-atomic changes of the prologue might be needed anyway.
> > I think you mean Mark Rutland's DYNAMIC_FTRACE_WITH_CALL_OPS. But that
> > still needs to be ready. Let's disable PREEMPT for ftrace first.
>
> FWIW, taking the patch to disable FTRACE with PREEMPT for now makes sense to
> me, too.
Thx, you agree with that.

>
> The DYNAMIC_FTRACE_WITH_CALL_OPS patches should be in v6.3. They're currently
> queued in the arm64 tree in the for-next/ftrace branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/ftrace
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/
>
> ... and those *should* be in v6.3.
Glade to hear that. Great!

>
> Patches to imeplement DIRECT_CALLS atop that are in review at the moment:
>
> https://lore.kernel.org/linux-arm-kernel/20230201163420.1579014-1-revest@xxxxxxxxxxxx/
Good reference. Thx for sharing.

>
> ... and if riscv uses the CALL_OPS approach, I believe it can do much the same
> there.
>
> If riscv wants to do a single atomic patch to each patch-site (to avoid
> stop_machine()), then direct calls would always needs to bounce through the
> ftrace_caller trampoline (and acquire the direct call from the ftrace_ops), but
> that might not be as bad as it sounds -- from benchmarking on arm64, the bulk
> of the overhead seen with direct calls is when using the list_ops or having to
> do a hash lookup, and both of those are avoided with the CALL_OPS approach.
> Calling directly from the patch-site is a minor optimization relative to
> skipping that work.
Yes, CALL_OPS could solve the PREEMPTION & stop_machine problems. I
would follow up.

The difference from arm64 is that RISC-V is 16bit/32bit mixed
instruction ISA, so we must keep ftrace_caller & ftrace_regs_caller in
2048 aligned. Then:
FTRACE_UPDATE_MAKE_CALL:
* addr+00: NOP // Literal (first 32 bits)
* addr+04: NOP // Literal (last 32 bits)
* addr+08: func: auipc t0, ? // All trampolines are in the 2048
aligned place, so this point won't be changed.
* addr+12: jalr ?(t0) // For different trampolines:
ftrace_regs_caller, ftrace_caller

FTRACE_UPDATE_MAKE_NOP:
* addr+00: NOP // Literal (first 32 bits)
* addr+04: NOP // Literal (last 32 bits)
* addr+08: func: c.j // jump to addr + 16 and skip broken insn & jalr
* addr+10: xxx // last half & broken insn of auipc t0, ?
* addr+12: jalr ?(t0) // To be patched to jalr ?<t0> ()
* addr+16: func body

Right? (The call site would be increased from 64bit to 128bit ahead of func.)

>
> Thanks,
> Mark.




--
Best Regards
Guo Ren