RE: 2.4.23 is freezing my systems hard after 24-48 hours

From: Marcelo Tosatti
Date: Sat Dec 13 2003 - 09:42:15 EST




On Fri, 12 Dec 2003, Jeremy Kusnetz wrote:

> > Have you tried running with the NMI watchdog? I cannot run it myself
> > because I have to disable APIC support since my motherboard is
> > buggy. To do so, try booting with "nmi_watchdog=1" or "nmi_watchdog=2"
> > depending on your configuration. Check Documentation/nmi_watchdog.txt
> > for details. Also verify that the NMI oopser works by checking for a
> > non-zero NMI count in /proc/interrupts.
>
> Wish I knew about that earlier, unfortunately I've gone back to the
> 2.4.22 kernel as these are production boxes and I can't afford any more
> of these outages. Besides I'm tired of getting paged at 4am :) I wish
> I could get my development boxes to have this problem since it's not a
> big deal if they go down.

Are you using netfilter? There is a known netfilter hang problem.

What workload you have on the production boxes? (what is running, etc)

Can you show us a complete hardware list, /proc/interrupts output,
.config and dmesg?

Trying to reproduce the oops on the development boxes with NMI watchdog
turned on might also help a lot.



> What kind of hardware are you running? These are Compaq DL360 G1 servers.

Philippe, can you point to your archived report message? I dont seem to
have it around.




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/