Re: Early boot hang on recent 2.6 kernels (> 2.6.3), on x86-64 with 16gb of RAM

From: Andi Kleen
Date: Mon Sep 18 2006 - 03:51:04 EST


Robin Lee Powell <rlpowell@xxxxxxxxxxxxxxxxxx> writes:
>
> This version is rather different, as it ends in:
>
> HARDWARE ERROR
> CPU 0: Machine Check Exception: 7 Bank 3: b40000000000083b
> RIP 10:<ffffffff80446e3e> {pci_conf1_read+0xbe/0xf0}
> TSC 2e7932dbf8 ADDR fdfc000cfc
> This is not a software problem!
> Run through mcelog --ascii to decode and contact your hardware vendor
> Kernel panic - not syncing: Uncorrected machine check

Decoded it gives

..
bus error 'local node origin, request didn't time out
data read mem transaction
i/o access, level generic'
..

It will probably boot with mce=off acpi=off pci=conf1

You got some buggy device that causes a bus timeout when its config space
is read. The old kernel most likely didn't touch it by luck.

Please add the following patch and send the whole log.
This will tell us which device has this problem.

-Andi

diff -u linux-2.6.17-hack/arch/i386/pci/direct.c-o linux-2.6.17-hack/arch/i386/pci/direct.c
--- linux-2.6.17-hack/arch/i386/pci/direct.c-o 2006-04-20 02:17:33.000000000 +0200
+++ linux-2.6.17-hack/arch/i386/pci/direct.c 2006-09-18 09:48:46.000000000 +0200
@@ -19,6 +19,9 @@
{
unsigned long flags;

+ printk("conf1 read bus %x devfn %x reg %x len %u\n",
+ bus, devfn, reg, len);
+
if ((bus > 255) || (devfn > 255) || (reg > 255)) {
*value = -1;
return -EINVAL;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/