Re: [bug] SLUB + mm/slab.c boot crash in -rc9

From: Ingo Molnar
Date: Tue Apr 15 2008 - 03:08:51 EST



* Pekka Enberg <penberg@xxxxxxxxxxxxxx> wrote:

> On Tue, Apr 15, 2008 at 9:25 AM, Ingo Molnar <mingo@xxxxxxx> wrote:
> > so it's probably the first few page allocations (setup_cpu_cache())
> > going wrong already - suggesting a some fundamental borkage in SLAB?
>
> I think it's still pointing to the page allocator and/or setting up
> the zonelists...

i did a .config bisection and it pinpointed CONFIG_SPARSEMEM=y as the
culprit. Changing it to FLATMEM gives a correctly booting system.

if you look at the good versus bad bootup log:

http://redhat.com/~mingo/misc/log-Tue_Apr_15_07_24_59_CEST_2008.good
http://redhat.com/~mingo/misc/log-Tue_Apr_15_07_24_59_CEST_2008.bad

(both SLUB) you'll see that the zone layout provided by the architecture
code is _exactly_ the same and looks sane as well. So this is not an
architecture zone layout bug, this is probably sparsemem setup (and/or
the page allocator) getting confused by something.

why are there no good debug logs possible in this area? To debug such
bugs we'd need an early dump of the precise layout of all memory maps,
what points where, how large it is, where it is allocated - and then
compare it with how the rest of the system is layed out - looking at
possible overlaps or other bugs. This 8-way box is a pain to debug on,
it takes a long time to boot it up, etc. etc.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/