Re: The Slab Allocator

Colin Plumb (
Thu, 7 Aug 97 00:55:22 MDT

Martin von Loewis wrote:
> You are probably thinking of a mechanism where you have a list of
> free objects, and do the allocation inline (invoking SLAB if the list
> is empty). I was wondering myself why this has not been done. Here
> a couple of possible reasons:
> - risk of fragmentation. you somehow have to choose when to return memory
> to SLAB, otherwise you risk eating too much memory.
> - race conditions: for file system code, and possible process management,
> interference with interrupt handlers is not an issue. However, on SMP
> systems, dirty things can happen. Per-processor might help, but also
> might worsen the first problem: One processor always allocates from SLAB
> because the list is empty, the other one always puts the released entry
> into its list.
> - limited functionality: I still don't understand what those *structors
> are used for - but you certainly could not support them in 2 xchg
> instructions. The only application I can think of is that you don't need
> to re-initialize items returned from the front-end list, if they are
> left in the state they had upon free().
> The first problem could be solved by putting an upper limit on the number
> of cached objects (a small number, like 4 or 8 - fine tuning would be
> subject to profiling). Could the second problem be solved with bus locking?

The usual thing is this:
- Each processor has a private cache, that can be accessed unlocked
(or using whatever uniprocessor locking is required, which is at
most a cli), and
- If the private cache is empty, it turns to a global slab system which
does require locking, but is hopefully rare.

Thus, the race condition that you refer to in the second point cannot

The *structors are to let you cache setup work between slab allocations.
For example, dentries have some linked lists which have initial null
states (forward and back pointing to self) which have to be set up on
allocation, but are true on deallocation. Thus, when you recycle an
entry, you don't have to reinitialize them. The constructor and
destructor are for when data is received from and returned to the
system pool.

Jeff Bonwick's version has a debugging option that call them on every
object allocation and deallocation. Things like wait queues, reference
counts, and so on that begin as zero and end as zero are useful
initializations to avoid.

As for the first problem, one suggestion I have is to tackle the hit rate
head-on. Establish a target front-end hit rate, say 99%. Then do the
following on each allocation:

if (there's something in the local cache) {
allocate it for return
if (--score < 0) {
get some more from the local cache and return it to the
global pool
score += TUNABLE
} else {
score += 100-TUNABLE;
allocate something from the global pool;

This will maintain the pool size at the level necessary to keep a
99% hit rate. If it's repeated allocation and deallocation of one
object, the local cache will contain one object. If it varies a lot,
the local cache will be bigger.

The logic is easiest to see if TUNABLE is 0, but over the long run,
the number of allocations from and frees to the global pool will be
the same, so it can have any value and maintain the same hit rate.
A low value of TUNABLE causes the cache to explode initially, as all
the allocations miss and score goes up a lot. It takes a long time
to settle down, as it won't free objects until the overall miss rate
will stay below 1% and there are a lot of initial misses to make up for.

A high value reduces that effect, but causes the cache to be slow to
free resources when usage goes down.

Anyway, maybe it can be fiddled with. the main point is that the
common case (decrement, test high bit, and do nothing) is very fast.

I'm a little unclear on David Miller's design constraint
"2) No cli()'s on any code path whatsoever".
Doesn't __get_free_pages do all kinds of locking (including possibly
synchronous disk I/O!), and doesn't *any* kernel memory allocator need
to call that sometimes? David, can you clarify?