[slub p2 0/4] SLUB: [RFC] Per cpu partial lists V2

From: Christoph Lameter
Date: Mon Jun 20 2011 - 11:33:37 EST


The following patchset applied on top of the lockless patchset V7. It
introduces per cpu partial lists which allow a performance increase of
around ~15 during contention for the nodelock (can be tested using
hackbench).

These lists help to avoid per nodelocking overhead. Allocator latency
could be further reduced by making these operations work without
disabling interrupts (like the fastpath and the free slowpath) as well as
implementing better ways of handling ther cpu array with partial pages.

I am still not satisfied with the cleanliness of the code after these
changes. Some review with suggestions as to how to restructure the
code given these changes in operations would be appreciated.

It is interesting to note that BSD has gone to a scheme with partial
pages only per cpu (source: Adrian). Transfer of cpu ownerships is
done using IPIs. Probably too much overhead for our taste. The use
of a few per cpu partial pages looks to be beneficial though.

Note that there is no performance gain when there is no contention.

Performance:

Before After
./hackbench 100 process 200000
Time: 2299.072 1742.454
./hackbench 100 process 20000
Time: 224.654 182.393
./hackbench 100 process 20000
Time: 227.126 182.780
./hackbench 100 process 20000
Time: 219.608 182.899
./hackbench 10 process 20000
Time: 21.769 18.756
./hackbench 10 process 20000
Time: 21.657 18.938
./hackbench 10 process 20000
Time: 23.193 19.537
./hackbench 1 process 20000
Time: 2.337 2.263
./hackbench 1 process 20000
Time: 2.223 2.271
./hackbench 1 process 20000
Time: 2.269 2.301


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/