Re: 2.6.25-git2: BUG: unable to handle kernel paging request atffffffffffffffff

From: Linus Torvalds
Date: Mon Apr 21 2008 - 21:15:49 EST




On Tue, 22 Apr 2008, Rafael J. Wysocki wrote:
> >
> > The same place, dentry.d_hash.next is 1. No slub debug clues... I think, I'll
> > give slab a try. Any other clues?
>
> Well, SLUB uses some per CPU data structures. Is it possible that they get
> corrupted and which leads to the observed symptoms?

It really doesn't look like the slub allocations themselves would be
corrupted. It very much looks like wild pointers corrupting allocations
that themselves were fine.

The nybble pattern looked intriguing (especially as it apparently also hit
a normal page cache page!) but obviously not everything matches that
pattern (eg your value of 1).

What do you do to trigger this? Any particular load? Is it still just
doing suspend/resume, or do you have something else that you are playing
with?

Also, have you tried CONFIG_DEBUG_PAGEALLOC? That can also be a very
powerful way to find memory corruption.

Does anybody see any other patterns? Looking at the modules linked in in
the oopses from Zdenek, Rafael and Jiri, I don't see anything odd. You
both all have 80211 support, maybe the corruption comes from the wireless
layer?

Or maybe it's the x86 code changes themselves, and it really is about the
suspend/resume sequence itself. Are all the people who see this doing
suspends?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/