Re: [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI

From: Mark Rutland
Date: Wed May 10 2023 - 12:30:51 EST


On Wed, May 10, 2023 at 08:28:17AM -0700, Doug Anderson wrote:
> Hi,

Hi Doug,

> On Wed, Apr 19, 2023 at 3:57 PM Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
> > This is an attempt to resurrect Sumit's old patch series [1] that
> > allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs and
> > also to round up CPUs in kdb/kgdb. The last post from Sumit that I
> > could find was v7, so I called this series v8. I haven't copied all of
> > his old changelongs here, but you can find them from the link.
> >
> > Since v7, I have:
> > * Addressed the small amount of feedback that was there for v7.
> > * Rebased.
> > * Added a new patch that prevents us from spamming the logs with idle
> > tasks.
> > * Added an extra patch to gracefully fall back to regular IPIs if
> > pseudo-NMIs aren't there.
> >
> > Since there appear to be a few different patches series related to
> > being able to use NMIs to get stack traces of crashed systems, let me
> > try to organize them to the best of my understanding:
> >
> > a) This series. On its own, a) will (among other things) enable stack
> > traces of all running processes with the soft lockup detector if
> > you've enabled the sysctl "kernel.softlockup_all_cpu_backtrace". On
> > its own, a) doesn't give a hard lockup detector.
> >
> > b) A different recently-posted series [2] that adds a hard lockup
> > detector based on perf. On its own, b) gives a stack crawl of the
> > locked up CPU but no stack crawls of other CPUs (even if they're
> > locked too). Together with a) + b) we get everything (full lockup
> > detect, full ability to get stack crawls).
> >
> > c) The old Android "buddy" hard lockup detector [3] that I'm
> > considering trying to upstream. If b) lands then I believe c) would
> > be redundant (at least for arm64). c) on its own is really only
> > useful on arm64 for platforms that can print CPU_DBGPCSR somehow
> > (see [4]). a) + c) is roughly as good as a) + b).

> It's been 3 weeks and I haven't heard a peep on this series. That
> means nobody has any objections and it's all good to land, right?
> Right? :-P

FWIW, there are still longstanding soundness issues in the arm64 pseudo-NMI
support (and fixing that requires an overhaul of our DAIF / IRQ flag
management, which I've been chipping away at for a number of releases), so I
hadn't looked at this in detail yet because the foundations are still somewhat
dodgy.

I appreciate that this has been around for a while, and it's on my queue to
look at.

Thanks,
Mark.