Re: [PATCH 0/5] make slab gfp fair

From: Peter Zijlstra
Date: Wed May 16 2007 - 16:41:34 EST


On Wed, 2007-05-16 at 13:27 -0700, Christoph Lameter wrote:
> On Wed, 16 May 2007, Peter Zijlstra wrote:
>
> > > So its no use on NUMA?
> >
> > It is, its just that we're swapping very heavily at that point, a
> > bouncing cache-line will not significantly slow down the box compared to
> > waiting for block IO, will it?
>
> How does all of this interact with
>
> 1. cpusets
>
> 2. dma allocations and highmem?
>
> 3. Containers?

Much like the normal kmem_cache would do; I'm not changing any of the
page allocation semantics.

For containers it could be that the machine is not actually swapping but
the container will be in dire straights.

> > > The problem here is that you may spinlock and take out the slab for one
> > > cpu but then (AFAICT) other cpus can still not get their high priority
> > > allocs satisfied. Some comments follow.
> >
> > All cpus are redirected to ->reserve_slab when the regular allocations
> > start to fail.
>
> And the reserve slab is refilled from page allocator reserves if needed?

Yes, using new_slab(), exacly as it would normally be.

> > > But this is only working if we are using the slab after
> > > explicitly flushing the cpuslabs. Otherwise the slab may be full and we
> > > get to alloc_slab.
> >
> > /me fails to parse.
>
> s->cpu[cpu] is only NULL if the cpu slab was flushed. This is a pretty
> rare case likely not worth checking.

Ah, right:
- !page || !page->freelist
- and no available partial slabs.

then we try the reserve (if we're entiteld).

> > > Remove the above two lines (they are wrong regardless) and simply make
> > > this the cpu slab.
> >
> > It need not be the same node; the reserve_slab is node agnostic.
> > So here the free page watermarks are good again, and we can forget all
> > about the ->reserve_slab. We just push it on the free/partial lists and
> > forget about it.
> >
> > But like you said above: unfreeze_slab() should be good, since I don't
> > use the lockless_freelist.
>
> You could completely bypass the regular allocation functions and do
>
> object = s->reserve_slab->freelist;
> s->reserve_slab->freelist = object[s->reserve_slab->offset];

That is basically what happens at the end; if an object is returned from
the reserve slab.

But its wanted to try the normal cpu_slab path first to detect that the
situation has subsided and we can resume normal operation.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/