Re: Hard LOCKUP with 2.6.32.28 (maybe scheduler/tick related?)

From: Don Zickus
Date: Mon Jan 31 2011 - 08:52:24 EST


On Mon, Jan 31, 2011 at 12:05:58PM +0100, Sebastian Färber wrote:
> Hi,
>
> i recently upgraded some servers from 2.6.32.9 to 2.6.32.28 and see
> frequent "hard lockups" on
> a few of them now. I've compiled a kernel with debugging support and
> enabled the "NMI Watchdog"
> to get more information.
> I've attached my .config and the stack traces from the nmi watchdog,
> captured via a serial console.
> To me it looks like there is some problem in run_posix_cpu_timers and
> the problem is also
> triggering WARNING: at kernel/sched_fair.c:979 hrtick_start_fair.
>
> Note that the kernel is patched with grsecurity and i'm running CONFIG_NO_HZ.
> There were no problems with 2.6.32.9.
> Would be great if someone could have a look at this, i can provide
> more information if neccessary.

Your attached 'crash' details had another stacktrace first. That one
shows the nmi_watchdog triggering because a spin_lock is spinning forever
in 'd_real_path'. I couldn't find that code in any upstream tree, then
again I was too lazy to clone the stable trees. So I don't know what the
exact problem is, but if you look through the git history of 2.6.32.28 and
revert things that relate to 'd_real_path', you can probably workaround
the problem for now, until someone who knows that stuff better than me can
give you a better answer.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/