Re: [PATCH] riscv: entry: Fixup do_trap_break from kernel side

From: Guo Ren
Date: Sun Jul 16 2023 - 19:33:51 EST


On Mon, Jul 10, 2023 at 4:02 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Sun, Jul 09, 2023 at 10:30:22AM +0800, Guo Ren wrote:
> > On Wed, Jul 5, 2023 at 12:40 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > >
> > > On Sat, Jul 01, 2023 at 10:57:07PM -0400, guoren@xxxxxxxxxx wrote:
> > > > From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
> > > >
> > > > The irqentry_nmi_enter/exit would force the current context into in_interrupt.
> > > > That would trigger the kernel to dead panic, but the kdb still needs "ebreak" to
> > > > debug the kernel.
> > > >
> > > > Move irqentry_nmi_enter/exit to exception_enter/exit could correct handle_break
> > > > of the kernel side.
> > >
> > > This doesn't explain much if anything :/
> > >
> > > I'm confused (probably because I don't know RISC-V very well), what's
> > > EBREAK and how does it happen?
> > EBREAK is just an instruction of riscv which would rise breakpoint exception.
> >
> >
> > >
> > > Specifically, if EBREAK can happen inside an local_irq_disable() region,
> > > then the below change is actively wrong. Any exception/interrupt that
> > > can happen while local_irq_disable() must be treated like an NMI.
> > When the ebreak happend out of local_irq_disable region, but
> > __nmi_enter forces handle_break() into in_interupt() state. So how
>
> And why is that a problem? I think I'm missing something fundamental
> here...
The irqentry_nmi_enter() would force the current context to get
in_interrupt=true, although ebreak happens in the context which is
in_interrupt=false.
A lot of checking codes, such as:
if (in_interrupt())
panic("Fatal exception in interrupt");
It would make the kernel panic, but we don't panic; we want back to the shell.
eg:
echo BUG > /sys/kernel/debug/provoke-crash/DIRECT

>
> > about:
> >
> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > index f910dfccbf5d..69f7043a98b9 100644
> > --- a/arch/riscv/kernel/traps.c
> > +++ b/arch/riscv/kernel/traps.c
> > @@ -18,6 +18,7 @@
> > #include <linux/irq.h>
> > #include <linux/kexec.h>
> > #include <linux/entry-common.h>
> > +#include <linux/context_tracking.h>
> >
> > #include <asm/asm-prototypes.h>
> > #include <asm/bug.h>
> > @@ -285,12 +286,18 @@ asmlinkage __visible __trap_section void
> > do_trap_break(struct pt_regs *regs)
> > handle_break(regs);
> >
> > irqentry_exit_to_user_mode(regs);
> > - } else {
> > + } else if (in_interrupt()){
> > irqentry_state_t state = irqentry_nmi_enter(regs);
> >
> > handle_break(regs);
> >
> > irqentry_nmi_exit(regs, state);
> > + } else {
> > + enum ctx_state prev_state = exception_enter();
> > +
> > + handle_break(regs);
> > +
> > + exception_exit(prev_state);
> > }
> > }
>
> That's wrong. If you want to make it conditional, you have to look at
> !(regs->status & SR_IE) (that's the interrupt enable flag of the
> interrupted context, right?)
>
> When you hit an EBREAK when IRQs were disabled, you must be NMI like.
>
> But making it conditional like this makes it really hard to write a
> handler though, it basically must assume it will be NMI contetx (because
> it can't know) so there is no point in sometimes not doing NMI context.






--
Best Regards
Guo Ren