Re: [PATCH 0/2] irq: detect slow IRQ handlers

From: Paul E. McKenney
Date: Tue Jan 12 2021 - 19:49:57 EST


On Tue, Jan 12, 2021 at 01:59:48PM +0000, Mark Rutland wrote:
> Hi,
>
> While fuzzing arm64 with Syzkaller (under QEMU+KVM) over a number of releases,
> I've occasionally seen some ridiculously long stalls (20+ seconds), where it
> appears that a CPU is stuck in a hard IRQ context. As this gets detected after
> the CPU returns to the interrupted context, it's difficult to identify where
> exactly the stall is coming from.
>
> These patches are intended to help tracking this down, with a WARN() if an IRQ
> handler takes longer than a given timout (1 second by default), logging the
> specific IRQ and handler function. While it's possible to achieve similar with
> tracing, it's harder to integrate that into an automated fuzzing setup.
>
> I've been running this for a short while, and haven't yet seen any of the
> stalls with this applied, but I've tested with smaller timeout periods in the 1
> millisecond range by overloading the host, so I'm confident that the check
> works.
>
> Thanks,
> Mark.

Nice!

Acked-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

I added the patch below to add a three-second delay to the scheduling
clock interrupt handler. This executed, but did not cause your warning
to be emitted, probably because rcutorture runs under qemu/KVM. So no
Tested-by, not yet, anyway.

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e04e336..dac8c7a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2606,6 +2606,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
*/
void rcu_sched_clock_irq(int user)
{
+ static atomic_t invctr;
+
trace_rcu_utilization(TPS("Start scheduler-tick"));
lockdep_assert_irqs_disabled();
raw_cpu_inc(rcu_data.ticks_this_gp);
@@ -2623,6 +2625,14 @@ void rcu_sched_clock_irq(int user)
invoke_rcu_core();
lockdep_assert_irqs_disabled();

+ if (atomic_inc_return(&invctr) % 0x3ffff == 0) {
+ int i;
+
+ pr_alert("%s: 3-second delay.\n", __func__);
+ for (i = 0; i < 3000; i++)
+ udelay(1000);
+ }
+
trace_rcu_utilization(TPS("End scheduler-tick"));
}