Re: [RFC PATCH 1/2] lib/percpu-list: Per-cpu list with associated per-cpu locks

From: Waiman Long
Date: Wed Feb 17 2016 - 12:42:16 EST


On 02/17/2016 12:18 PM, Peter Zijlstra wrote:
On Wed, Feb 17, 2016 at 12:12:57PM -0500, Waiman Long wrote:
On 02/17/2016 11:27 AM, Christoph Lameter wrote:
On Wed, 17 Feb 2016, Waiman Long wrote:

I know we can use RCU for singly linked list, but I don't think we can use
that for doubly linked list as there is no easy way to make atomic changes to
both prev and next pointers simultaneously unless you are taking about 16b
cmpxchg which is only supported in some architecture.
But its supported in the most important architecutes. You can fall back to
spinlocks on the ones that do not support it.

I guess with some limitations on how the lists can be traversed, we may be
able to do that with RCU without lock. However, that will make the code more
complex and harder to verify. Given that in both my and Dave's testing that
contentions with list insertion and deletion are almost gone from the perf
profile when they used to be a bottleneck, is it really worth the effort to
do such a conversion?
My initial concern was the preempt disable delay introduced by holding
the spinlock over the entire iteration.

There is no saying how many elements are on that list and there is no
lock break.

But preempt_disable() is called at the beginning of the spin_lock() call. So the additional preempt_disable() in percpu_list_add() is just to cover the this_cpu_ptr() call to make sure that the cpu number doesn't change. So we are talking about a few ns at most here.

Actually, I think I can remove the preempt_disable() and preempt_enable() calls as we just need to put list entry in one of the per-cpu lists. It doesn't need to be the same CPU of the current task.

Cheers,
Longman