Re: 2.4.23 is freezing my systems hard after 24-48 hours

From: Philippe Troin
Date: Fri Dec 12 2003 - 16:29:50 EST


"Jeremy Kusnetz" <JKusnetz@xxxxxxxx> writes:

> I've read that enabling ip-chains compatibility would cause this,
> but I do not have this feature enabled at all.
>
> I have a cluster of 8 servers all doing the same thing that I
> upgraded to a stock 2.4.23 kernel, after that period of time one
> random one will lock up hard. No output to screen, can't sysrq or
> anything, only physically hitting the power button can get me out of
> it. I've gotten nothing in any of my logs to give any indication on
> what's going on.
>
> They don't seem to come when the server is under load, but more on
> how long the server has been up. Actually I do have this kernel
> running in my development environment, but none of those machines
> have ever locked up, it seems they need some load to eventually
> cause this to happen.
>
> I had been running 2.4.20 with no problems before the upgrade.
>
> I haven't tried running a bk series kernel yet, in the mean time
> I've downgraded to 2.4.22 with the do_brk patch. I haven't had this
> kernel up long enough to see if it will crash.

You're not alone... I have the same problems: 2.4.22 works, 2.4.23
locks up apparently randomly. I cannot get a backtrace with sysrq
either.

Have you tried running with the NMI watchdog? I cannot run it myself
because I have to disable APIC support since my motherboard is
buggy. To do so, try booting with "nmi_watchdog=1" or "nmi_watchdog=2"
depending on your configuration. Check Documentation/nmi_watchdog.txt
for details. Also verify that the NMI oopser works by checking for a
non-zero NMI count in /proc/interrupts.

If only I could get a backtrace... :-)

Phil.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/