Re: [PATCH -next V7 1/7] riscv: ftrace: Fixup panic by disabling preemption

From: Mark Rutland
Date: Mon Jan 30 2023 - 05:54:51 EST


On Sat, Jan 28, 2023 at 05:37:46PM +0800, Guo Ren wrote:
> On Thu, Jan 12, 2023 at 8:16 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
> >
> > Hi Guo,
> >
> > On Thu, Jan 12, 2023 at 04:05:57AM -0500, guoren@xxxxxxxxxx wrote:
> > > From: Andy Chiu <andy.chiu@xxxxxxxxxx>
> > >
> > > In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
> > > forming a jump that jumps to an address over 4K. This may cause errors
> > > if we want to enable kernel preemption and remove dependency from
> > > patching code with stop_machine(). For example, if a task was switched
> > > out on auipc. And, if we changed the ftrace function before it was
> > > switched back, then it would jump to an address that has updated 11:0
> > > bits mixing with previous XLEN:12 part.
> > >
> > > p: patched area performed by dynamic ftrace
> > > ftrace_prologue:
> > > p| REG_S ra, -SZREG(sp)
> > > p| auipc ra, 0x? ------------> preempted
> > > ...
> > > change ftrace function
> > > ...
> > > p| jalr -?(ra) <------------- switched back
> > > p| REG_L ra, -SZREG(sp)
> > > func:
> > > xxx
> > > ret
> >
> > As mentioned on the last posting, I don't think this is sufficient to fix the
> > issue. I've replied with more detail there:
> >
> > https://lore.kernel.org/lkml/Y7%2F3hoFjS49yy52W@FVFF77S0Q05N/
> >
> > Even in a non-preemptible SMP kernel, if one CPU can be in the middle of
> > executing the ftrace_prologue while another CPU is patching the
> > ftrace_prologue, you have the exact same issue.
> >
> > For example, if CPU X is in the prologue fetches the old AUIPC and the new
> > JALR (because it races with CPU Y modifying those), CPU X will branch to the
> > wrong address. The race window is much smaller in the absence of preemption,
> > but it's still there (and will be exacerbated in virtual machines since the
> > hypervisor can preempt a vCPU at any time).
> >
> > Note that the above is even assuming that instruction fetches are atomic, which
> > I'm not sure is the case; for example arm64 has special CMODX / "Concurrent
> > MODification and eXecutuion of instructions" rules which mean only certain
> > instructions can be patched atomically.
> >
> > Either I'm missing something that provides mutual exclusion between the
> > patching and execution of the ftrace_prologue, or this patch is not sufficient.
> This patch is sufficient because riscv isn't the same as arm64. It
> uses default arch_ftrace_update_code, which uses stop_machine.
> See kernel/trace/ftrace.c:
> void __weak arch_ftrace_update_code(int command)
> {
> ftrace_run_stop_machine(command);
> }

Ah; sorry, I had misunderstood here, since the commit message spoke in terms of
removing that.

As long as stop_machine() is used I agree this is safe; sorry for the noise.

> ps:
> Yes, it's not good, and it's expensive.

We can't have everything! :)

Thanks,
Mark.