Re: regression from softlockup fix

From: Ingo Molnar
Date: Mon Nov 19 2007 - 04:44:18 EST



* David Miller <davem@xxxxxxxxxxxxx> wrote:

> I suspect that what is happening is that the NOHZ period is longer
> than the softlockup timeout (10 seconds) and we get an interrupt
> before the watchdog thread gets onto the cpu.

indeed! Does the patch below do the trick?

Ingo

--------------->
Subject: softlockup: do the wakeup from a hrtimer
From: Ingo Molnar <mingo@xxxxxxx>

David Miller reported soft lockup false-positives that trigger
on NOHZ due to CPUs idling for more than 10 seconds.

The solution is to drive the wakeup of the watchdog threads
not from the timer tick (which has no guaranteed frequency),
but from the watchdog tasks themselves.

Reported-by: David Miller <davem@xxxxxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
---
kernel/softlockup.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

Index: linux/kernel/softlockup.c
===================================================================
--- linux.orig/kernel/softlockup.c
+++ linux/kernel/softlockup.c
@@ -100,10 +100,6 @@ void softlockup_tick(void)

now = get_timestamp(this_cpu);

- /* Wake up the high-prio watchdog task every second: */
- if (now > (touch_timestamp + 1))
- wake_up_process(per_cpu(watchdog_task, this_cpu));
-
/* Warn about unreasonable 10+ seconds delays: */
if (now <= (touch_timestamp + softlockup_thresh))
return;
@@ -141,7 +137,7 @@ static int watchdog(void *__bind_cpu)
while (!kthread_should_stop()) {
set_current_state(TASK_INTERRUPTIBLE);
touch_softlockup_watchdog();
- schedule();
+ msleep(1000);
}

return 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/