Re: [PATCH v8 09/10] arm64: kgdb: Roundup cpus using IPI as NMI

From: Doug Anderson
Date: Mon May 15 2023 - 19:12:35 EST


Hi,

On Fri, May 12, 2023 at 7:00 AM Daniel Thompson
<daniel.thompson@xxxxxxxxxx> wrote:
>
> On Wed, Apr 19, 2023 at 03:56:03PM -0700, Douglas Anderson wrote:
> > From: Sumit Garg <sumit.garg@xxxxxxxxxx>
> >
> > arm64 platforms with GICv3 or later supports pseudo NMIs which can be
> > leveraged to roundup CPUs which are stuck in hard lockup state with
> > interrupts disabled that wouldn't be possible with a normal IPI.
> >
> > So instead switch to roundup CPUs using IPI turned as NMI. And in
> > case a particular arm64 platform doesn't supports pseudo NMIs,
> > it will switch back to default kgdb CPUs roundup mechanism.
> >
> > Signed-off-by: Sumit Garg <sumit.garg@xxxxxxxxxx>
> > Tested-by: Chen-Yu Tsai <wens@xxxxxxxx>
> > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > ---
> >
> > (no changes since v1)
> >
> > arch/arm64/kernel/ipi_nmi.c | 5 +++++
> > arch/arm64/kernel/kgdb.c | 18 ++++++++++++++++++
> > 2 files changed, 23 insertions(+)
> >
> > diff --git a/arch/arm64/kernel/ipi_nmi.c b/arch/arm64/kernel/ipi_nmi.c
> > index c592e92b8cbf..2adaaf1519e5 100644
> > --- a/arch/arm64/kernel/ipi_nmi.c
> > +++ b/arch/arm64/kernel/ipi_nmi.c
> > @@ -8,6 +8,7 @@
> >
> > #include <linux/interrupt.h>
> > #include <linux/irq.h>
> > +#include <linux/kgdb.h>
> > #include <linux/nmi.h>
> > #include <linux/smp.h>
> >
> > @@ -45,10 +46,14 @@ bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
> > static irqreturn_t ipi_nmi_handler(int irq, void *data)
> > {
> > irqreturn_t ret = IRQ_NONE;
> > + unsigned int cpu = smp_processor_id();
>
> Does this play nice with CONFIG_DEBUG_PREEMPT? I may have missed
> something about the NMI entry but a quick scan of the arm64
> arch_irq_disabled() suggests that debug_smp_processor_id() will issue
> warnings at this point...
>
> Other than I didn't see anything I don't like here.

Good question. It seems to, at least on the sc7180-trogdor-based
system I tested.

Just to confirm, I printed out the values of some of the stuff that's
checked. When this function was called, I saw:

irqs_disabled() => true
raw_local_save_flags() => 0x000000f0
__irqflags_uses_pmr() => 1

The "__irqflags_uses_pmr" is the thing that gets set when we try to
enable pseudo-NMIs and they're actually there. That causes us to call
__pmr_irqs_disabled_flags() which will compare the flags (0xf0) to
GIC_PRIO_IRQON (0xe0). If flags is not the same as GIC_PRIO_IRQON then
interrupts are disabled.

...so, assuming I understood everything, I think we're OK.

I also tried to see what happened with the whole "fallback to use
regular IPIs if we don't have pseudo-NMIs enabled" (AKA patch #10 in
this series). In that case, I had:

irqs_disabled() => true
raw_local_save_flags() => 0x000000c0
__irqflags_uses_pmr() => 0

...in this case we end up in __daif_irqs_disabled_flags(). That checks
to see if the flags (0xc0) has the "I bit" (0x80) set. It is set, so
interrupts are disabled.

-Doug