Re: dying hdd causing MCE and panic (libata)

From: Alan Cox
Date: Sun Apr 20 2008 - 06:01:48 EST


> - Is it normal that a simple hard disk failure (that is not even
> the system disk) causes MCEs and kernel panics?

No but with old style (pre AHCI) IDE it can because the system may MCE if
the CPU<->Disk hangs. If a drive is causing PSU problems it could also
occur I guess - ditto heat. You'd need to decode the MCE

> - Is this a problem that is induced completely on the hardware
> level (eg. the southbridge going crazy and making the whole
> hardware platform unstable) or a problem that could be fixed
> or handled properly on the software (kernel) level?

An MCE the hardware can recover from is reported and we continue. CPU
Context Corrupt means the processor internally set a "cannot carry on"
indicator.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/