Re: [GIT PULL] Lockless SLUB slowpaths for v3.1-rc1

From: Christoph Lameter
Date: Mon Aug 01 2011 - 11:55:50 EST



The future plans that I have for performance improvements are:

1. The percpu partial lists.

The min_partial settings are halved by this approach so that there wont be
any excessive memory usage. Pages on per cpu partial lists are frozen and
this means that the __slab_free path can avoid taking node locks for a
page that is cached by another processor. This causes another significant
performance gain in hackbench of up to 20%. The problem here is to fine
tune the approach and clean up the patchset.

2. per cpu full lists.

These will not be specific to a particular slab cache but shared amoung
all of them. This will reduce the need to keep empty slab pages on the
per node partial lists and therefore also reduce memory consumption.

The per cpu full lists will be essentially a caching layer for the
page allocator and will make slab acquisition and release as fast
as the slub fastpath for alloc and free (it uses the same
this_cpu_cmpxchg_double based approach). I basically gave up on
fixing up the page allocator fastpath after trying various approaches
over the last weeks. Maybe the caching layer can be made available
for other kernel subsystems that need fast page access too.

The scaling issues that are left over are then those caused by

1. The per node lock taken for the partial lists per node.
This can be controlled by enlarging the per cpu partial lists.

2. The necessity to go to the page allocator.
This will be tunable by configuring the caching layer.

3. Bouncing cachelines for __remote_free if multiple processors
enter __slab_free for the same page.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/