Re: regression from softlockup fix

From: Jeremy Fitzhardinge
Date: Mon Nov 19 2007 - 12:16:25 EST


Ingo Molnar wrote:
> * David Miller <davem@xxxxxxxxxxxxx> wrote:
>
>
>> I suspect that what is happening is that the NOHZ period is longer
>> than the softlockup timeout (10 seconds) and we get an interrupt
>> before the watchdog thread gets onto the cpu.
>>
>
> indeed! Does the patch below do the trick?
>
> Ingo
>
> --------------->
> Subject: softlockup: do the wakeup from a hrtimer
> From: Ingo Molnar <mingo@xxxxxxx>
>
> David Miller reported soft lockup false-positives that trigger
> on NOHZ due to CPUs idling for more than 10 seconds.
>
> The solution is to drive the wakeup of the watchdog threads
> not from the timer tick (which has no guaranteed frequency),
> but from the watchdog tasks themselves.
>

I thought the timer code kicked the watchdog after waking up after a
long sleep anyway? At one point I was looking into a mechanism to
temporarily disable the watchdog during a wait for a timer event, but it
got complex - and I thought - unnecessary.

Specifically this in kernel/time/timekeeping.c:

/*
* When we are idle and the tick is stopped, we have to touch
* the watchdog as we might not schedule for a really long
* time. This happens on complete idle SMP systems while
* waiting on the login prompt. We also increment the "start of
* idle" jiffy stamp so the idle accounting adjustment we do
* when we go busy again does not account too much ticks.
*/
if (ts->tick_stopped) {
touch_softlockup_watchdog();
ts->idle_jiffies++;
}

Or does this happen on the sleep path? If so, wouldn't the right fix to
be this on the wakeup path?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/