Re: [PATCH v12 0/7] arm64: Add IPI for backtraces / kgdb; try to use NMI for some IPIs

From: Chen-Yu Tsai
Date: Thu Aug 31 2023 - 03:09:06 EST


On Thu, Aug 31, 2023 at 3:14 AM Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> This is an attempt to resurrect Sumit's old patch series [1] that
> allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs and
> also to round up CPUs in kdb/kgdb. The last post from Sumit that I
> could find was v7, so I started my series at v8. I haven't copied all
> of his old changelongs here, but you can find them from the link.
>
> This patch series targets v6.6. Specifically it can't land in v6.5
> since it depends on commit 8d539b84f1e3 ("nmi_backtrace: allow
> excluding an arbitrary CPU").
>
> It should be noted that Mark still feels there might be some corner
> cases where pseudo-NMI is not production ready [2] [3], but as far as
> I'm aware there are no concrete/documented issues. Regardless of
> whether this should be enabled for production, though, this series
> will be invaluable to anyone trying to debug crashes on arm64
> machines.
>
> v12 of this series collects tags, fixes a few small nits in comments
> and commit messages from v11 and adds a new (and somewhat unrelated)
> small patch to the end of the series. There are no code changes other
> than the last patch, which is tiny.
>
> v11 of this series addressed Stephen Boyd's feedback on v10 and added
> a missing "static" that the patches robot found.
>
> v10 of this series attempted to address all of Mark's feedback on
> v9. As a quick summary:
> - It includes his patch to remove IPI_WAKEUP, freeing up an extra IPI.
> - It no longer combines the "kgdb" and "backtrace" IPIs. If we need
> another IPI these could always be recombined later.
> - It promotes IPI_CPU_STOP and IPI_CPU_CRASH_STOP to NMI.
> - It puts nearly all the code directly in smp.c.
> - Several of the patches are squashed together.
> - Patch #6 ("kgdb: Provide a stub kgdb_nmicallback() if !CONFIG_KGDB")
> was dropped from the series since it landed.
>
> Between v8 and v9, I had cleaned up this patch series by integrating
> the 10th patch from v8 [4] into the whole series. As part of this, I
> renamed the "NMI IPI" to the "debug IPI" since it could now be backed
> by a regular IPI in the case that pseudo NMIs weren't available. With
> the fallback, this allowed me to drop some extra patches from the
> series. This feels (to me) to be pretty clean and hopefully others
> agree. Any patch I touched significantly I removed Masayoshi and
> Chen-Yu's tags from.
>
> ...also in v8, I reorderd the patches a bit in a way that seemed a
> little cleaner to me.
>
> Since v7, I have:
> * Addressed the small amount of feedback that was there for v7.
> * Rebased.
> * Added a new patch that prevents us from spamming the logs with idle
> tasks.
> * Added an extra patch to gracefully fall back to regular IPIs if
> pseudo-NMIs aren't there.
>
> It can be noted that this patch series works very well with the recent
> "hardlockup" patches that have landed through Andrew Morton's tree and
> are currently in mainline. It works especially well with the "buddy"
> lockup detector.
>
> [1] https://lore.kernel.org/linux-arm-kernel/1604317487-14543-1-git-send-email-sumit.garg@xxxxxxxxxx/
> [2] https://lore.kernel.org/lkml/ZFvGqD%2F%2Fpm%2FlZb+p@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> [3] https://lore.kernel.org/lkml/ZNDKVP2m-iiZCz3v@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> [4] https://lore.kernel.org/r/20230419155341.v8.10.Ic3659997d6243139d0522fc3afcdfd88d7a5f030@changeid/
>
> Changes in v12:
> - ("arm64: smp: Mark IPI globals as __ro_after_init") new for v12.
> - Added a comment about why we account for 16 SGIs when Linux uses 8.
> - Minor comment change to add "()" after nmi_trigger_cpumask_backtrace.
> - Updated the commit hash of the commit this depends on.
>
> Changes in v11:
> - Adjust comment about NR_IPI/MAX_IPI.
> - Don't use confusing "backed by" idiom in comment.
> - Made arm64_backtrace_ipi() static.
> - Updated commit message as per Stephen.
> - arch_send_wakeup_ipi() now takes an unsigned int.
>
> Changes in v10:
> - ("IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI") new for v10.
> - ("arm64: smp: Remove dedicated wakeup IPI") new for v10.
> - Backtrace now directly supported in smp.c
> - Don't allocate the cpumask on the stack; just iterate.
> - Moved kgdb calls to smp.c to avoid needing to export IPI info.
> - Rewrite as needed for 5.11+ as per Mark Rutland and Sumit.
> - Squash backtrace into patch adding support for pseudo-NMI IPIs.
> - kgdb now has its own IPI.
>
> Changes in v9:
> - Added comments that we might not be using NMI always.
> - Added to commit message that this doesn't catch all cases.
> - Fold in v8 patch #10 ("Fallback to a regular IPI if NMI isn't enabled")
> - Moved header file out of "include" since it didn't need to be there.
> - Remove arm64_supports_nmi()
> - Remove fallback for when debug IPI isn't available.
> - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
> - arch_trigger_cpumask_backtrace() no longer returns bool
>
> Changes in v8:
> - "Tag the arm64 idle functions as __cpuidle" new for v8
> - Removed "#ifdef CONFIG_SMP" since arm64 is always SMP
> - debug_ipi_setup() and debug_ipi_teardown() no longer take cpu param
>
> Douglas Anderson (6):
> irqchip/gic-v3: Enable support for SGIs to act as NMIs
> arm64: idle: Tag the arm64 idle functions as __cpuidle
> arm64: smp: Add arch support for backtrace using pseudo-NMI
> arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI
> arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI
> roundup
> arm64: smp: Mark IPI globals as __ro_after_init
>
> Mark Rutland (1):
> arm64: smp: Remove dedicated wakeup IPI

Whole series is

Tested-by: Chen-Yu Tsai <wenst@xxxxxxxxxxxx>

on SolidRun i.MX8MM Hummingboard Pulse by injecting HARDLOCKUP with lkdtm.