Re: [bug] mm/slab.c boot crash in -git, "kernel BUG atmm/slab.c:2103!"

From: Ingo Molnar
Date: Fri Apr 11 2008 - 05:25:40 EST



* Pekka Enberg <penberg@xxxxxxxxxxxxxx> wrote:

> On Fri, Apr 11, 2008 at 12:05 PM, Pekka Enberg <penberg@xxxxxxxxxxxxxx> wrote:
> > > Right. Then you probably want to look into any changes in arch/x86/
> > > related to setting up the zonelists. I'm fairly certain this is not a
> > > slab bug and I don't see any recent changes to the page allocator
> > > either that would explain this.
> >
> > I'd be willing to put some money on this:
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b7ad149d62ffffaccb9f565dfe7e5bae739d6836
>
> And I'd lose as you're 32-bit. Oh well, that's the price to pay for
> pretending to know x86 arch internals.

yeah, sorry - we are working hard to unify generic bits like that, but
it's a huge architecture.

btw., i always felt that the zone/memory setup is rather fragile and
ad-hoc in places and it trusts the architecture code too much. Just in
the .25 cycle i've seen about a dozen bugs all around that thing. I
believe we should work on making the info that an architecture feeds to
the MM "fool proof" - i.e. sanity-check for overlaps and other common
setup errors. It is easy for an architecture to mess up those things...
Especially on oddball systems that are too large or too small to be
normally tested. It's a common, reoccuring bug pattern that we could
avoid by being a bit more resilient.

if this is a zone setup bug then a sanity-check could catch it right
where it happens - not much later in the slab code or so.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/