RE: CPU failures ... or something else ?

From: Alan Cox (alan@lxorguk.ukuu.org.uk)
Date: Fri Dec 27 2002 - 18:30:54 EST


On Thu, 2002-12-26 at 06:03, Joseph D. Wagner wrote:
> > Message from syslogd@localhost at Tue Dec 24 11:30:32 2002 ...
> > localhost kernel: Kernel panic: CPU context corrupt
>
> What that basically means is that given some values A, B, and C, in the
> context of those values the kernel expects X, Y, and Z to be of some other
> value, but X, Y, and Z aren't turning out to be expected.

Ermm no.

Lets correct a few comments made here

1. On Pentium II/III we have no known cases where MCE is misreported.
(Some old pentiums dont have the external bits of the MCA/MCE stuff
wired up right so do trigger it). Radeon IGP also triggers it but for
what appear to be valid reasons (Linux confused the chipset badly)

2. The panic occurs because the CPU set flags saying that the CPU state
was corrupt. That is it trapped out at a point where the previous state
could not be recovered, so the trap is fatal

3. Don't be suprised if it moves CPU. It is possible (and not uncommon)
to set up dual systems so that both CPU's boot and race to become CPU#0.
This is actually recommended since the box will then almost always boot
with one failed CPU.

Running each CPU as a single CPU test may find a faulty CPU.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Dec 31 2002 - 22:00:11 EST