Re: Opteron fatal machine check during PCI probe

From: Andi Kleen
Date: Wed Jun 16 2004 - 19:29:33 EST


On Wed, 16 Jun 2004 17:06:02 -0700
Tim Hockin <thockin@xxxxxxxxxx> wrote:

> Hey all,
>
> I have a couple dual Opteron boxen that consistently gets an MCE during
> PCI probing. This is from linux-2.6.6, but the EXACT same scenario happens
> on a 2.4.x kernel.

> The MCE shows that the error is an IO read, with the address 0xfdfc000cfe.
> The RIP points to pci_conf1_read(), when we try to inw() from the PCI data
> register.

Is it an master abort (0x100 set in MC4_STATUS) ?
If yes it's an BIOS issue, the BIOS are supposed to disable that one.

> This is called during the PCI probing, and stops the kernel dead in it's
> tracks. The disassembly of the surrounding code is:
>
> ffffffff802822c5: 89 ca mov %ecx,%edx
> ffffffff802822c7: 83 e2 02 and $0x2,%edx
> ffffffff802822ca: 66 81 c2 fc 0c add $0xcfc,%dx
> ffffffff802822cf: 66 ed in (%dx),%ax
>
> This all seems legit to me.
>
> What is interesting is that the address 0xfdfc000cfe is correct in the
> low-order 16 bits. The extra 0xfdfc000000 is what is puzzling to me, or
> maybe it's a red herring.

It is. in only uses 16 bits of its operand.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/