[patch 2/4] nohz: Prevent erroneous tick stop invocations

From: Thomas Gleixner
Date: Fri Dec 22 2017 - 09:54:11 EST


The conditions in irq_exit() to invoke tick_nohz_irq_exit() are:

if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))

This is too permissive in various aspects:

1) If need_resched() is set, then the tick cannot be stopped whether
the CPU is idle or in nohz full mode.

2) If need_resched() is not set, but softirqs are pending then this is an
indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted
task takes precedence over softirqd.

Invoking tick_nohz_irq_exit() in these cases can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat....

Prevent that by making the conditions proper and only allow invokation when
in idle or nohz full mode and neither need_resched() nor
local_softirq_pending() are set.

Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
---
kernel/softirq.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -382,7 +382,8 @@ static inline void tick_irq_exit(void)
int cpu = smp_processor_id();

/* Make sure that timer wheel updates are propagated */
- if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
+ if ((idle_cpu(cpu) || tick_nohz_full_cpu(cpu)) &&
+ !need_resched() && !local_softirq_pending()) {
if (!in_interrupt())
tick_nohz_irq_exit();
}