Re: Linux 2.6.2, AMD kernel: MCE: The hardware reports a non fatal, correctable incident

From: Philippe Elie
Date: Wed Mar 03 2004 - 02:59:24 EST


On Tue, 02 Mar 2004 at 21:55 +0000, Dave Jones wrote:

> On Tue, Mar 02, 2004 at 07:00:16PM +0100, Davi Leal wrote:
> > What about this message?. Note that the system works. I have not had to
> > reboot. What meens the below message?.
> >
>
> The original plan behind that option was to find hardware faults early,
> but it seems to trigger a lot of false positives for various reasons.
> Part of this problem is that MCEs can also be generated on some hardware
> by doing something silly like reading from a reserved part of your
> motherboard chipset..
>
> There are also CPU errata that can cause them to falsely trigger in
> some unusual cases, but I've not had time to go through the various
> errata datasheets to blacklist affected CPUs unfortunatly.
>
> I'm toying with the idea of marking it CONFIG_BROKEN for 2.6,
> and fixing it up later.

I'm unsure if it's a good idea it's broken only on broken HW, people
wanting stable box try to buy sane HW and don't enable CONFIG_BROKEN
so they will never see if their HW are starting to be out of spec.

Perhaps rewording the option help and the error message to say it's
known to report false positive...


regards,
Phil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/