Re: [PATCH] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first

From: Daniel Thompson
Date: Fri Mar 01 2024 - 06:42:24 EST


On Thu, Feb 29, 2024 at 10:34:26AM -0800, Doug Anderson wrote:
> Hi,
>
> On Wed, Feb 28, 2024 at 5:11 AM Daniel Thompson
> <daniel.thompson@xxxxxxxxxx> wrote:
> >
> > > I'm still hoping to get some sort of feedback here. If people think
> > > this is a terrible idea then I'll shut up now and leave well enough
> > > alone, but it would be nice to actively decide and get the patch out
> > > of limbo.
> >
> > I've read patch through a couple of times and was generally convinced by
> > the "do what x86 does" argument.
> >
> > However until now I've always held my council since I wasn't familiar
> > with these code paths and I figured it was OK for me to have no opinion
> > because the first line of the description says that kgdb/kdb is 100% not
> > involved in causing the problem ;-) .
> >
> > However today I also took a look at the HAVE_NMI architectures and there
> > is no consensus between them about how to implement this: PowerPC uses
> > NMI and most of the others use IRQ only, s390 special cases for the
> > panic code path and acts differently compared to a normal SMP shutdown.
> >
> > <snip>
> >
> > However, if we talking ourselves into copying x86 then perhaps we should
> > more accurately copy x86! Assuming I read the x86 code correctly then
> > crash_smp_send_stop() will (mostly) go staight to NMI rather
> > than trialling an IRQ first! That is not what is currently implemented
> > in the patch for arm64.
>
> Sure, I'm happy to change the patch to work that way, though I might
> wait to get some confirmation from a maintainer that they think this
> idea is worth pursuing before spending more time on it.

100%. Don't respin on my account.

> I don't think it would be hard to have the "crash stop" code jump
> straight to NMI if that's what people want. Matching x86 here seems
> reasonable, though I'd also say that my gut still says that even for
> crash stop we should try to stop things cleanly before jumping to NMI.
> I guess I could imagine that the code we're kexec-ing to generate the
> core file might be more likely to find the hardware in a funny state
> if we stopped CPUs w/ NMI vs IRQ.

In terms of the "right thing to do" for kdump then reviewing the s390
might be a good idea. Unfortunately it's a bit different to the other
arches and I can't offer a 95% answer about what that arch does.


Daniel.