[PATCH v3 0/2] irq: detect slow IRQ handlers

From: Mark Rutland
Date: Thu Jul 15 2021 - 05:50:38 EST


Hi,

While fuzzing arm64 with Syzkaller (under QEMU+KVM) over a number of releases,
I've occasionally seen some ridiculously long stalls (20+ seconds), where it
appears that a CPU is stuck in a hard IRQ context. As this gets detected after
the CPU returns to the interrupted context, it's difficult to identify where
exactly the stall is coming from.

These patches are intended to help tracking this down, with a WARN() if an IRQ
handler takes longer than a given timout (1 second by default), logging the
specific IRQ and handler function. While it's possible to achieve similar with
tracing, it's harder to integrate that into an automated fuzzing setup.

I've been running this for a short while, and haven't yet seen any of the
stalls with this applied, but I've tested with smaller timeout periods in the 1
millisecond range by overloading the host, so I'm confident that the check
works.

Thanks,
Mark.

Since v1 [1]:
* Minor commit message tweaks
* Add Paul's Acked-by
* Trivial rebase to v5.13-rc4

Since v2 [2]:
* Trivial rebase to v5.14-rc1

[1] https://lore.kernel.org/r/20210112135950.30607-1-mark.rutland@xxxxxxx
[2] https://lore.kernel.org/r/20210615102507.9677-1-mark.rutland@xxxxxxx

Mark Rutland (2):
irq: abstract irqaction handler invocation
irq: detect long-running IRQ handlers

kernel/irq/chip.c | 15 +++----------
kernel/irq/handle.c | 4 +---
kernel/irq/internals.h | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 15 +++++++++++++
4 files changed, 76 insertions(+), 15 deletions(-)

--
2.11.0