[patch v2 0/5] percpu_counter: bug fix and enhancement

From: Shaohua Li
Date: Wed May 11 2011 - 11:40:01 EST


The patch sets do two things.
1. fix bug for 32-bit system. percpu_counter uses s64 counter. Without any
locking reading s64 in 32-bit system isn't safe and can cause bad side effect.
2. improve scalability for __percpu_counter_add. In some cases, _add could
cause heavy lock contention (see patch 4 for detailed infomation and data).
The patches will remove the contention and speed up it a bit. Last post
(http://marc.info/?l=linux-kernel&m=130259547913607&w=2) simpliy uses
atomic64 for percpu_counter, but Tejun pointed out this could cause
deviation in __percpu_counter_sum.
The new implementation uses lglock to protect percpu data. Each cpu has its
private lock while other cpu doesn't take. In this way _add doesn't need take
global lock anymore and remove the deviation. This still gives me about
about 5x ~ 6x faster (not that faster than the original 7x faster, but still
good) with the workload mentioned in patch 4.

patch 1 fix s64 read bug for 32-bit system for UP
patch 2 convert lglock to be used by dynamaically allocated structre. Later
patch will use lglock for percpu_counter
patch 3,4 fix s64 read bug for 32-bit system for MP. And it also improve the
scalability for __percpu_counter_add.
patch 5 is from Christoph Lameter to make __percpu_counter_add fastpath
preemptless. I added it here because I converted percpu_counter to use
lglock. All bugs are from mine.

Comments and suggestions are welcomed!

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/