Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereferenceat virtual address 000,0000a

From: Christian Weiske
Date: Sun Sep 24 2006 - 05:10:51 EST


Andrew,


>> I have a reproducible BUG on my server that occurs whenever disk usage
>> gets too high / too much swapping occurs (at least I think that is). The
>> box has one reiserfs filesystem of about 187GB size, the disk is on an
>> Epia 5000 board, between them is a Promise Ultra 100 PCI IDE controller
>> card.
> Do you think this bug is due to the 2.6.18 upgrade?

No. I already had it in 2.6.17.6.

> Have you run fsck across the filesystem(s)?
fsck at boot turns up
> ReiserFS: hde3: checking transaction log (hde3)
> ReiserFS: hde3: replayed 22 transactions in 0 seconds
> ReiserFS: hde3: Using r5 hash to sort names
nothing more

> Does the oops always look the same as this one?
No, not exactly the same. I attach three log files. If you diff them,
there will be about 30% of the lines different.

One thing I have to note is that the second Oops appears about 10
seconds after the first one.

> Please turn on the various CONFIG_DEBUG_* options, see if that turns up
> anything.
That indeed turns up something. The debug messages indicate that java
wants to lock something and gets stuck. Note that the messages until
"slab corruption" are printed first, and the others about a minute or
two later.

And I still can ping and do everything until the slab corruption occurs.
(Thus the other messages some minute later)


> It would be interesting to find out if enabling CONFIG_4KSTACKS makes this
> go away (although I'm not sure why).
Didn't try this yet, but will.

I put the logs in a tar.bz2 because I didn't want to flood the list with
a 200k message.

--
Regards/MfG,
Christian Weiske

Attachment: dojo kernelpanic + debug.tar.bz2
Description: Binary data

Attachment: signature.asc
Description: OpenPGP digital signature