The Slab Allocator (Was: Re: New dcache not using slab allocator?)

Mark Hemment (markhe@nextd.demon.co.uk)
Tue, 5 Aug 1997 14:33:25 +0100 (BST)


Hi,

As the implementor of the SLAB, I was quite shocked to see mm/simp.c
appear in the kernel source....

I'm currently implementing a new, 'experimental', Memory Management
sub-system where the SLAB is heavy used. I've still got many problems to
solve, one of them being a super-fast front-end to the slab-allocator.

The current allocator is fast, and has a good (low) level of page
fragmentation[1] - which is v. important for long uptimes - and reasonable
internal fragmentation[2] for a large range of object sizes (different
packing strategies are used for different sizes, but this difference adds
very little extra weight for an allocation or release, and keeps memory
usage down).
[1] Page fragmentation is the distribution of allocated objects. Ideally,
an allocator wants to ensure the allocated objects come from a low number
of pages. This allows it to release 'cached' (delayed) pages back to the
MM sub-system. If the object distribution is poor, page-fragmentation is
poor.
[2] Internal fragmentation is the ratio of bytes used to manage an object
[internally] compared to the size of the object.

Keeping the page fragmentation low has a slight cost for an allocation,
and a slightly higher cost for a release. For caches which have a usage
pattern where the number of allocated objects is roughly constant there is
little need in trying to keep this fragmentation down.
So, my (still to be implemented) "fast front-end" uses the concept of a
'locally' free object (as in a Lazy Buddy Allocator). Released objects
are attached to the head of the locally-free list. Allocations remove
objects from the head. If the head is empty the back-end (which is the
current kmem_cache_alloc()) is used to find objects in the cache pool
(which is the current slab cache) or allocate (and construct) new
objects. Object reaping removes objects from the locally-free list into
the back-end before running kmem_cache_reap().
This front-end can cause page fragmentation, as objects on the
locally-free list are considered allocated as far as the back-end is
concerned - breaking the partial ordering which is used by the SLAB. (The
partial ordering is very important, this keeps the page-fragmentation
down!!!).
For small objects, with a high (but constant) usage pattern,
fragmentation is not a major problem. This is where a "fast front-end"
can be a win.

I'm also investigating the use of per-engine (processor) caches backed
by a global cache. This is used in Mach's Zone allocator.
This scheme reduces SMP-lock contention (which is not a _huge_
problem insider the allocator), but most importantly it helps the h/w
caches - particularly when the system has per-engine L2 caches.

Also, I have plans to improve the debugging support. It already has
red-zoning and object-poisioning, and will soon have debugging history
ring-buffers for improved debugging support.

Finally, I'm linking pages allocated by the slab-allocator into the
'experimental' memory-management reaping strategy. This improves over-all
fairness.

NOTE: When implementing the allocator, I was carefully to make the
common code paths as short as possible. Also, most importantly, the
ordering of members in the internal structures were designed to reduce
pollution of the L1 h/w D-cache during an allocation (the D-cache is
normally much 'hotter' during an allocation when compared to a release,
which is why kmem_cache_alloc() is lighter than kmem_cache_free()).
When it makes sense, the kmem_slab_t structure is placed within the pages
allocated for the objects, reducing TLB pollution for some architectures -
and this slab management structure is also L1 cached coloured!
The internal wastage [of a slab] is used to give the objects a
cache-colour - so the memory is not [completely] wasted.

I believe the SLAB allocator is a fast, flexible, low fragmentation,
memory allocator with debugging support. I've yet to study, in detail,
the code for the SIMP so I cannot yet pass full judgement on it. But it
does look to have high internal and page fragmentation. It also uses
[the same!] page-allocation order independent of object size. It's
order (2, which means 4 physically contigious pages) places a strain on
the page allocation/reaping routines (I was critisied for the strain my
SLAB allocator placed, which is why I throttled it back. My current MM
work is designed to reduce/remove this). Using a high page order, with
high page-fragmentation is not good.

I feel that a light, fast, front-end to the SLAB is the correct solution
- there should not be two memory allocators in the kernel.

Regards,

markhe

------------------------------------------------------------------
Mark Hemment, Unix/C Software Engineer (Contractor)
markhe@nextd.demon.co.uk http://www.nextd.demon.co.uk/
"Success has many fathers, failure is a B**TARD!" - anon
------------------------------------------------------------------