[PATCH 0/3] *** Detect interrupt storm in softlockup ***

From: Bitao Hu
Date: Tue Jan 23 2024 - 07:12:45 EST


Hi guys,
I have previously encountered an issue where an NVMe interrupt
storm caused a softlockup, but the call tree did not provide useful
information. This is because the call tree is merely a snapshot and
does not fully reflect the CPU's state over the duration of the
softlockup_thresh period. Consequently, I think that reporting CPU
utilization (system, softirq, hardirq, idle) during a softlockup would
be beneficial for identifying issues related to interrupt storms, as
well as assisting in the analysis of other causes of softlockup.
Furthermore, reporting the most time-consuming hardirqs during a
softlockup could directly pinpoint which interrupt is responsible
for the issue.

Bitao Hu (3):
watchdog/softlockup: low-overhead detection of interrupt storm
watchdog/softlockup: report the most time-consuming hardirq
watchdog/softlockup: add parameter to control the reporting of
time-consuming hardirq

include/linux/irq.h | 9 ++
include/linux/irqdesc.h | 2 +
kernel/irq/irqdesc.c | 9 +-
kernel/watchdog.c | 289 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 308 insertions(+), 1 deletion(-)

--
2.37.1 (Apple Git-137.1)