Re: [PATCH v3] mm: make expand_downwards symmetrical toexpand_upwards

From: James Bottomley
Date: Thu Apr 21 2011 - 17:22:44 EST


On Thu, 2011-04-21 at 16:07 -0500, Christoph Lameter wrote:
> On Thu, 21 Apr 2011, James Bottomley wrote:
>
> > > Dave Hansen, Mel: Can you provide us with some help? (Its Easter and so
> > > the europeans may be off for awhile)
> >
> > It sort of depends on your definition of easy. The problem going from
> > DISCONTIGMEM to SPARSEMEM is sorting out the section size (the minimum
> > indivisible size for a sectional_mem_map array) and also deciding on
> > whether you need SPARSEMEM_EXTREME (discontigmem allows arbitrarily
> > different sizes for each contiguous region) or
> > ARCH_HAS_HOLES_MEMORYMODEL (allows empty mem_map regions as well). I
> > suspect most architectures will want SPARSEMEM_EXTREME (it means that
> > the section array isn't fully populated) because the gaps can be huge
> > (we've got a 64GB gap on parisc).
>
> Well my favorite is SPARSEMEM_VMEMMAP because it allows page level holes
> and uses the TLB (via page tables) to avoid lookups in the SPARSE maps but
> that is likely not going to be in an initial fix.

Really, no ... that requires additional pte insertion logic and some
other stuff that's nasty to craft and requires significant testing.

> > However, even though I think we can do this going forwards ... I don't
> > think we can backport it as a bug fix for the slub panic.
>
> So far there seems to be no other solution that will fix the issues
> cleanly since we have a clash of the notions of a node in !NUMA between
> core and discontig. Which is a pretty basic thing to get wrong.

Yes there is ... there's the slub patch or the marking as broken.
Either are much simpler.

> If we can avoid all the fancy stuff and Dave can just get a minimal SPARSE
> config going then this may be the best solution for stable as well.
>
> But then these configs have been broken for years and no one noticed. This
> means the users of these arches likely have been running a subset of
> kernel functionality. I suspect they have never freed memory from
> DISCONTIG node 1 and higher without CONFIG_DEBUG_VM on. Otherwise I
> cannot explain why the VM_BUG_ONs did not trigger in
> mm/page_alloc.c:move_freepages() that should have been brought to the MM
> developers attention.

Yes they have. As willy said, they've just never been run with DEBUG_VM
or HUGEPAGES or, until recently, SLUB. The test boxes (at least for
parisc) get hammered quite a lot to flush out coherency issues. That's
why I'm confident this panic only triggers for slub. I found the panic
within about two days of turning SLUB on.

> This set of circumstances leads to the suspicion that there were only
> tests run that showed that the kernel booted. Higher node memory was never
> touched and the MM code was never truly exercised.

Look, try to stay on point with logic: they have been extensively
tested, just not in the slub configuration, which is the only one that
crashes. As I explained (several times) we're just now picking up slub
because debian now enables it by default.

> So I am not sure that there is any urgency in this matter. No one has
> cared for years after all.

If we didn't care, we wouldn't be making all this fuss. It's only a
couple of days since the bug was reported, which should indicate the
high importance attached to it (well, by everyone except you,
apparently).

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/