Re: Linux 2.6.22-rc2

From: Stephen Hemminger
Date: Tue May 22 2007 - 20:29:52 EST


Linus Torvalds wrote:
On Tue, 22 May 2007, Mike Houston wrote:
In this case I actually had the kernel crash. First time for me ever
having a kernel oops! System locked up with keyboard LED's blinking.

Not sure if anyone wants to see all of it (maybe some screwy
userland stuff involved), so I won't include that mess in the
message. It's here:
http://www.mikeserv.org/files/kernelcrash.txt

I think you have major memory corruption. That first oops disassembles to

mov 0x10(%eax),%esi
mov $0xfffffdfd,%eax
test %esi,%esi
je after_call
mov %edx,%ecx
mov %edi,%eax
mov %ebx,%edx
call *%esi
after_call:

which is (from net/ipv4/af_inet.c, inet_ioctl()):

default:
if (sk->sk_prot->ioctl)
err = sk->sk_prot->ioctl(sk, cmd, arg);
else
err = -ENOIOCTLCMD;
break;

and the load off "sk->sk_prot->ioctl" oopses, because "sk->sk_prot" is corrupt and contains 0x8e3cad42, which is not a valid kernel pointer.

The other oops is even worse.

I also think it meshes with

sky2 eth0: descriptor error q=0x280 get=285 [800042375e2e5e] put=285

Descriptor error means, the driver told it to do something but the OWNER bit wasn't set.
Only ever saw this on the Gigabyte motherboard.

It looks like the chip reads the wrong memory sometimes. The problem happens only on the on-board NIC's
and only on this kind of motherboard. For testing, I have put code in to check that the receive data actually
arrived before the IRQ, it triggered on my Gigabyte 925 motherboard. It appears that DMA access
is messed up. This board has lots of "overclocker" friendly stuff; maybe the BIOS never really sets up the PCI
bridges and clocks properly.

It doesn't seem like a software or driver problem. I have tried tweaking PCI registers but nothing worked
in this case.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/