[PATCH v9 0/7] arm64: Add debug IPI for backtraces / kgdb; try to use NMI for it

From: Douglas Anderson
Date: Thu Jun 01 2023 - 17:37:11 EST


This is an attempt to resurrect Sumit's old patch series [1] that
allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs and
also to round up CPUs in kdb/kgdb. The last post from Sumit that I
could find was v7, so I started my series at v8. I haven't copied all
of his old changelongs here, but you can find them from the link.

I'm really looking for a way to land this patch series. In response to
v8, Mark Rutland indicated [2] that he was worried about the soundness
of pseudo NMI. Those definitely need to get fixed, but IMO this patch
series could still land in the meantime. That would at least let
people test with it.

Request for anyone reading this: please help indicate your support of
this patch series landing by replying, even if you don't have the
background for a full review. My suspicion is that there are a lot of
people who agree that this would be super useful to get landed.

Since v8, I have cleaned up this patch series by integrating the 10th
patch from v8 [3] into the whole series. As part of this, I renamed
the "NMI IPI" to the "debug IPI" since it could now be backed by a
regular IPI in the case that pseudo NMIs weren't available. With the
fallback, this allowed me to drop some extra patches from the
series. This feels (to me) to be pretty clean and hopefully others
agree. Any patch I touched significantly I removed Masayoshi and
Chen-Yu's tags from.

...also in v8, I reorderd the patches a bit in a way that seemed a
little cleaner to me.

Since v7, I have:
* Addressed the small amount of feedback that was there for v7.
* Rebased.
* Added a new patch that prevents us from spamming the logs with idle
tasks.
* Added an extra patch to gracefully fall back to regular IPIs if
pseudo-NMIs aren't there.

It can be noted that this patch series works very well with the recent
"hardlockup" patches that have landed through Andrew Morton's tree and
are currently in linuxnext. It works especially well with the "buddy"
lockup detector.

[1] https://lore.kernel.org/linux-arm-kernel/1604317487-14543-1-git-send-email-sumit.garg@xxxxxxxxxx/
[2] https://lore.kernel.org/lkml/ZFvGqD%2F%2Fpm%2FlZb+p@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
[3] https://lore.kernel.org/r/20230419155341.v8.10.Ic3659997d6243139d0522fc3afcdfd88d7a5f030@changeid/

Changes in v9:
- Add a warning if we don't have enough IPIs for the NMI IPI
- Added comments that we might not be using NMI always.
- Added missing "inline"
- Added to commit message that this doesn't catch all cases.
- Fold in v8 patch #10 ("Fallback to a regular IPI if NMI isn't enabled")
- Moved header file out of "include" since it didn't need to be there.
- Remove arm64_supports_nmi()
- Remove fallback for when debug IPI isn't available.
- Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
- Update commit description
- arch_trigger_cpumask_backtrace() no longer returns bool

Changes in v8:
- "Provide a stub kgdb_nmicallback() if !CONFIG_KGDB" new for v8
- "Tag the arm64 idle functions as __cpuidle" new for v8
- Removed "#ifdef CONFIG_SMP" since arm64 is always SMP
- debug_ipi_setup() and debug_ipi_teardown() no longer take cpu param

Douglas Anderson (2):
arm64: idle: Tag the arm64 idle functions as __cpuidle
kgdb: Provide a stub kgdb_nmicallback() if !CONFIG_KGDB

Sumit Garg (5):
irqchip/gic-v3: Enable support for SGIs to act as NMIs
arm64: Add framework for a debug IPI
arm64: smp: Assign and setup the debug IPI
arm64: ipi_debug: Add support for backtrace using the debug IPI
arm64: kgdb: Roundup cpus using the debug IPI

arch/arm64/include/asm/irq.h | 3 +
arch/arm64/kernel/Makefile | 2 +-
arch/arm64/kernel/idle.c | 4 +-
arch/arm64/kernel/ipi_debug.c | 102 ++++++++++++++++++++++++++++++++++
arch/arm64/kernel/ipi_debug.h | 13 +++++
arch/arm64/kernel/kgdb.c | 14 +++++
arch/arm64/kernel/smp.c | 11 ++++
drivers/irqchip/irq-gic-v3.c | 29 +++++++---
include/linux/kgdb.h | 1 +
9 files changed, 168 insertions(+), 11 deletions(-)
create mode 100644 arch/arm64/kernel/ipi_debug.c
create mode 100644 arch/arm64/kernel/ipi_debug.h

--
2.41.0.rc2.161.g9c6817b8e7-goog