Re: Machine check expection panic

From: Andi Kleen (ak@suse.de)
Date: Wed Aug 06 2003 - 20:00:14 EST


Dave Jones <davej@redhat.com> writes:
> #
> diff -Nru a/arch/i386/kernel/cpu/mcheck/k7.c b/arch/i386/kernel/cpu/mcheck/k7.c
> --- a/arch/i386/kernel/cpu/mcheck/k7.c Wed Aug 6 23:33:40 2003
> +++ b/arch/i386/kernel/cpu/mcheck/k7.c Wed Aug 6 23:33:40 2003
> @@ -81,7 +81,7 @@
> wrmsr (MSR_IA32_MCG_CTL, 0xffffffff, 0xffffffff);
> nr_mce_banks = l & 0xff;
>
> - for (i=0; i<nr_mce_banks; i++) {
> + for (i=1; i<nr_mce_banks; i++) {

The change looks rather suspicious to me.

Bank 0 is the data cache unit (DC)

Do you have an errata that says that the DC bank is bad on all Athlons?

Normally BIOS or microcode are supposed to turn off bad MCEs by
masking them in another register. Maybe the person's CPU has a
real problem that is just masked now, e.g. it could be overclocked
and stress the cache too much.

The original MCE was:

Status: (4) Machine Check in progress.
Restart IP invalid.
parsebank(0): f606200000000833 @ 4040
        External tag parity error
        Uncorrectable ECC error
        CPU state corrupt. Restart not possible
        Address in addr register valid
        Error enabled in control register
        Error not corrected.
        Error overflow
        Bus and interconnect error
        Participation: Local processor originated request
        Timeout: Request did not timeout
        Request: Generic error
        Transaction type : Instruction
        Memory/IO : Other

Tyan 2466 motherboard
2 Athon MP 1200 processors (1200?)

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Aug 07 2003 - 22:00:36 EST