Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413

From: Benjamin Herrenschmidt
Date: Tue Nov 22 2011 - 16:58:39 EST


On Tue, 2011-11-22 at 08:48 +0100, Eric Dumazet wrote:
> Le lundi 21 novembre 2011 Ã 21:18 -0600, Christoph Lameter a Ãcrit :
>
> > Hmmm... That means that c->page points to page not frozen. Per cpu
> > partial pages are frozen until they are reused or until the partial list
> > is flushed.
> >
> > Does this ever happen on x86 or only on other platforms? In put_cpu_partial() the
> > this_cpu_cmpxchg really needs really to be irq safe. this_cpu_cmpxchg is
> > only preempt safe.
> >
> > Index: linux-2.6/mm/slub.c
> > ===================================================================
> > --- linux-2.6.orig/mm/slub.c 2011-11-21 21:15:41.575673204 -0600
> > +++ linux-2.6/mm/slub.c 2011-11-21 21:16:33.442336849 -0600
> > @@ -1969,7 +1969,7 @@
> > page->pobjects = pobjects;
> > page->next = oldpage;
> >
> > - } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
> > + } while (irqsafe_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
> > stat(s, CPU_PARTIAL_FREE);
> > return pobjects;
> > }
> >
>
> For x86, I wonder if our !X86_FEATURE_CX16 support is correct on SMP
> machines.
>
> this_cpu_cmpxchg16b_emu() claims to be IRQ safe, but may be buggy...
>
> Could we have somewhere a NMI handler calling kmalloc() ?

Christian and I are on ppc, which uses the generic implementation of
this_cpu_cmpxchg() which is not irq safe. So the above patch is needed
regardless.

Christian, can you try it see if that helps in your case ?

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/