Re: [PATCH v9 0/7] arm64: Add debug IPI for backtraces / kgdb; try to use NMI for it

From: Doug Anderson
Date: Mon Jul 24 2023 - 11:56:25 EST


Hi folks,

On Thu, Jun 1, 2023 at 2:37 PM Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> This is an attempt to resurrect Sumit's old patch series [1] that
> allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs and
> also to round up CPUs in kdb/kgdb. The last post from Sumit that I
> could find was v7, so I started my series at v8. I haven't copied all
> of his old changelongs here, but you can find them from the link.
>
> I'm really looking for a way to land this patch series. In response to
> v8, Mark Rutland indicated [2] that he was worried about the soundness
> of pseudo NMI. Those definitely need to get fixed, but IMO this patch
> series could still land in the meantime. That would at least let
> people test with it.
>
> Request for anyone reading this: please help indicate your support of
> this patch series landing by replying, even if you don't have the
> background for a full review. My suspicion is that there are a lot of
> people who agree that this would be super useful to get landed.
>
> Since v8, I have cleaned up this patch series by integrating the 10th
> patch from v8 [3] into the whole series. As part of this, I renamed
> the "NMI IPI" to the "debug IPI" since it could now be backed by a
> regular IPI in the case that pseudo NMIs weren't available. With the
> fallback, this allowed me to drop some extra patches from the
> series. This feels (to me) to be pretty clean and hopefully others
> agree. Any patch I touched significantly I removed Masayoshi and
> Chen-Yu's tags from.
>
> ...also in v8, I reorderd the patches a bit in a way that seemed a
> little cleaner to me.
>
> Since v7, I have:
> * Addressed the small amount of feedback that was there for v7.
> * Rebased.
> * Added a new patch that prevents us from spamming the logs with idle
> tasks.
> * Added an extra patch to gracefully fall back to regular IPIs if
> pseudo-NMIs aren't there.
>
> It can be noted that this patch series works very well with the recent
> "hardlockup" patches that have landed through Andrew Morton's tree and
> are currently in linuxnext. It works especially well with the "buddy"
> lockup detector.
>
> [1] https://lore.kernel.org/linux-arm-kernel/1604317487-14543-1-git-send-email-sumit.garg@xxxxxxxxxx/
> [2] https://lore.kernel.org/lkml/ZFvGqD%2F%2Fpm%2FlZb+p@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> [3] https://lore.kernel.org/r/20230419155341.v8.10.Ic3659997d6243139d0522fc3afcdfd88d7a5f030@changeid/
>
> Changes in v9:
> - Add a warning if we don't have enough IPIs for the NMI IPI
> - Added comments that we might not be using NMI always.
> - Added missing "inline"
> - Added to commit message that this doesn't catch all cases.
> - Fold in v8 patch #10 ("Fallback to a regular IPI if NMI isn't enabled")
> - Moved header file out of "include" since it didn't need to be there.
> - Remove arm64_supports_nmi()
> - Remove fallback for when debug IPI isn't available.
> - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
> - Update commit description
> - arch_trigger_cpumask_backtrace() no longer returns bool
>
> Changes in v8:
> - "Provide a stub kgdb_nmicallback() if !CONFIG_KGDB" new for v8
> - "Tag the arm64 idle functions as __cpuidle" new for v8
> - Removed "#ifdef CONFIG_SMP" since arm64 is always SMP
> - debug_ipi_setup() and debug_ipi_teardown() no longer take cpu param
>
> Douglas Anderson (2):
> arm64: idle: Tag the arm64 idle functions as __cpuidle
> kgdb: Provide a stub kgdb_nmicallback() if !CONFIG_KGDB
>
> Sumit Garg (5):
> irqchip/gic-v3: Enable support for SGIs to act as NMIs
> arm64: Add framework for a debug IPI
> arm64: smp: Assign and setup the debug IPI
> arm64: ipi_debug: Add support for backtrace using the debug IPI
> arm64: kgdb: Roundup cpus using the debug IPI
>
> arch/arm64/include/asm/irq.h | 3 +
> arch/arm64/kernel/Makefile | 2 +-
> arch/arm64/kernel/idle.c | 4 +-
> arch/arm64/kernel/ipi_debug.c | 102 ++++++++++++++++++++++++++++++++++
> arch/arm64/kernel/ipi_debug.h | 13 +++++
> arch/arm64/kernel/kgdb.c | 14 +++++
> arch/arm64/kernel/smp.c | 11 ++++
> drivers/irqchip/irq-gic-v3.c | 29 +++++++---
> include/linux/kgdb.h | 1 +
> 9 files changed, 168 insertions(+), 11 deletions(-)

I'm looking for some ideas on what to do to move this patch series
forward. Thanks to Daniel, the kgdb patch is now in Linus's tree which
hopefully makes this simpler to land. I guess there is still the
irqchip dependency that will need to be sorted out, though...

Even if folks aren't in agreement about whether this is ready to be
enabled in production, I don't think anything here is super
objectionable or controversial, is it? Can we land it? If you feel
like it needs extra review, would it help if I tried to drum up some
extra people to provide review feedback?

Also: in case it's interesting to anyone, I've been doing benchmarks
on sc7180-trogdor devices in preparation for enabling this. On that
platform, I did manage to see about 4% reduction in a set of hackbench
numbers when fully enabling pseudo-NMI. However, when I instead ran
Speedometer 2.1 I saw no difference. See:

https://issuetracker.google.com/issues/197061987

-Doug