Re: SMP 2.2.15pre13 unstable on Dell PE1300 - aic7xxx related?

From: David L. Parsley (lkml account) (kparse@salem.k12.va.us)
Date: Tue Mar 07 2000 - 08:42:32 EST


Hi Ingo,

On Mon, 6 Mar 2000, Ingo Molnar wrote:

> On Mon, 6 Mar 2000, David L. Parsley (lkml account) wrote:
>
> > We bought a Dell Poweredge 1300 w/ dual PIII-500 & 256M ram, and have had
> > problems with 'sudden death' - it reboots immediately and reports an error
> > along the lines of 'Memory error or NMI'. [...]
>
> hm, Linux has no such message. Do you get this 'Memory error or NMI' when
> the BIOS does the RAM check? This could be a fault in a RAM chip being
> triggered by high DMA traffic.

Yes, this message comes from the BIOS after the machine has rebooted. I'm
going to find and run memtest86; but one thing I forgot to mention is that
we fell back to a UP kernel and it ran like a champ for weeks. No SMP
kernel has gone more than about 2-3 days, sometimes much less (a few
hours). I'll also dig up Doug's mem testing script and try that.

> Sporadic reboots in the 2.2.latest kernel
> line are almost certainly a sign of hardware error. (it could also be
> overheating of any hw component: RAM, motherboard, CPU or disk)

I guess I should check the fans on both CPU's - a bad fan on CPU #2 would
certainly cause problems ;-)

thanks,
        David

- --
David L. Parsley
Network Specialist
City of Salem Schools

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Mar 07 2000 - 21:00:22 EST