Re: new raid server crashed - no idea why!

From: Ross Macintyre (
Date: Fri Jul 18 2003 - 04:37:06 EST

Thanks for replying Alan. Now I've got your attention, and I've checked
some of the logs, maybe I can give you a bit more information and ask
some other questions.

On Wed, 16 Jul 2003 12:01:18 -0400 (EDT) Alan Cox <>

> > RedHat 8.0 software and the latest raid drivers from LSILogic.
> > The spec of the machine is dual processor - AMD MP 2400+ with
> > LSILogic MegaRAID 320-2, and adaptec ultra wide scsi.
> The data roughly speaking says "it broke". Its not useful log data
> alas.
> > BIOS failed to enable PCI standards compliance, fixing this error.
> > mtrr: your CPUs had inconsistent fixed MTRR settings
> > mtrr: probably your BIOS does not setup all CPUs)
> Linux fixed both of these up. The former may indicate old BIOS rather
> than anything much to worry about

I'm not too concerned about these but I thought I'd mention them. I
applied the latest BIOS update but these messages didn't disappear.

> If its a dual athlon run memtest86 on it for a few hours - thats my first
> guess fro the traces and the fact the dual athlons are infamously
> sensitive about ram

Could you be a bit more specific about this please?

This isn't so easy for me to do as it means taking my main server down
for a while but I will do this if I need to. I have a machine which is
a clone of the problem machine that isn't live, and if need be, I could
swap the memory about and run memtest86 on the problem machine's memory
in the other machine. (I am assuming here that the machine needs to be
shut down to run memtest86?)

I have run 'lastcomm' on the machine that had to be reset, and found

initlog root ?? 0.00 secs Wed Jul 16 10:23
initlog root ?? 0.00 secs Wed Jul 16 10:23
accton S root ?? 0.00 secs Wed Jul 16 10:23
kswapd F root ?? 15385.60 secs Mon Jun 23 12:12
automount F root ?? 0.00 secs Tue Jul 15 23:14
portmap F rpc ?? 0.00 secs Tue Jul 15 23:14
automount F root ?? 0.00 secs Tue Jul 15 23:14

(The machine was reset on Wed Jul 16 10:23)
This suggests to me that kswapd stopped and was the last process to run
on the machine, which is also what the kernel trace in message log
suggested. What I have done is check the swap area by remaking it with
'mkswap -c' but it passed this test ok.
I feel I really need to get to the bottom of this as this machine will
be the home server for everyone in this department and I can't afford
to have it crashing on me, but before I test the memory, I want to have
looked at all the other possibilities first.

Any more suggestions?
Thanks in advanve,


Ross Macintyre
Heriot-Watt University

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to More majordomo info at Please read the FAQ at

This archive was generated by hypermail 2b29 : Wed Jul 23 2003 - 22:00:32 EST