Re: [PATCH v2] percpu-refcount: Use normal instead of RCU-sched"

From: Dennis Zhou
Date: Sat Nov 16 2019 - 23:11:24 EST


On Fri, Nov 08, 2019 at 06:35:53PM +0100, Sebastian Andrzej Siewior wrote:
> This is a revert of commit
> a4244454df129 ("percpu-refcount: use RCU-sched insted of normal RCU")
>
> which claims the only reason for using RCU-sched is
> "rcu_read_[un]lock() â are slightly more expensive than preempt_disable/enable()"
>
> and
> "As the RCU critical sections are extremely short, using sched-RCU
> shouldn't have any latency implications."
>
> The problem with using RCU-sched here is that it disables preemption and
> the release callback (called from percpu_ref_put_many()) must not
> acquire any sleeping locks like spinlock_t. This breaks PREEMPT_RT
> because some of the users acquire spinlock_t locks in their callbacks.
>
> Using rcu_read_lock() on PREEMPTION=n kernels is not any different
> compared to rcu_read_lock_sched(). On PREEMPTION=y kernels there are
> already performance issues due to additional preemption points.
> Looking at the code, the rcu_read_lock() is just an increment and unlock
> is almost just a decrement unless there is something special to do. Both
> are functions while disabling preemption is inlined.
> Doing a small benchmark, the minimal amount of time required was mostly
> the same. The average time required was higher due to the higher MAX
> value (which could be preemption). With DEBUG_PREEMPT=y it is
> rcu_read_lock_sched() that takes a little longer due to the additional
> debug code.
>
> Convert back to normal RCU.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> ---
> On 2019-11-07 12:36:53 [-0500], Dennis Zhou wrote:
> > > some RCU section here invoke callbacks which acquire spinlock_t locks.
> > > This does not work on RT with disabled preemption.
> > >
> >
> > Yeah, so adding a bit in the commit message about why it's an issue for
> > RT kernels with disabled preemption as I don't believe this is an issue
> > for non-RT kernels.
>
> I realized that I had partly in the commit message so I rewrote the
> second chapter hopefully covering it all now more explicit.
>
> v1âv2: Slightly rewriting the second paragraph regarding RT
> implications.
>
> include/linux/percpu-refcount.h | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
> index 7aef0abc194a2..390031e816dcd 100644
> --- a/include/linux/percpu-refcount.h
> +++ b/include/linux/percpu-refcount.h
> @@ -186,14 +186,14 @@ static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr)
> {
> unsigned long __percpu *percpu_count;
>
> - rcu_read_lock_sched();
> + rcu_read_lock();
>
> if (__ref_is_percpu(ref, &percpu_count))
> this_cpu_add(*percpu_count, nr);
> else
> atomic_long_add(nr, &ref->count);
>
> - rcu_read_unlock_sched();
> + rcu_read_unlock();
> }
>
> /**
> @@ -223,7 +223,7 @@ static inline bool percpu_ref_tryget(struct percpu_ref *ref)
> unsigned long __percpu *percpu_count;
> bool ret;
>
> - rcu_read_lock_sched();
> + rcu_read_lock();
>
> if (__ref_is_percpu(ref, &percpu_count)) {
> this_cpu_inc(*percpu_count);
> @@ -232,7 +232,7 @@ static inline bool percpu_ref_tryget(struct percpu_ref *ref)
> ret = atomic_long_inc_not_zero(&ref->count);
> }
>
> - rcu_read_unlock_sched();
> + rcu_read_unlock();
>
> return ret;
> }
> @@ -257,7 +257,7 @@ static inline bool percpu_ref_tryget_live(struct percpu_ref *ref)
> unsigned long __percpu *percpu_count;
> bool ret = false;
>
> - rcu_read_lock_sched();
> + rcu_read_lock();
>
> if (__ref_is_percpu(ref, &percpu_count)) {
> this_cpu_inc(*percpu_count);
> @@ -266,7 +266,7 @@ static inline bool percpu_ref_tryget_live(struct percpu_ref *ref)
> ret = atomic_long_inc_not_zero(&ref->count);
> }
>
> - rcu_read_unlock_sched();
> + rcu_read_unlock();
>
> return ret;
> }
> @@ -285,14 +285,14 @@ static inline void percpu_ref_put_many(struct percpu_ref *ref, unsigned long nr)
> {
> unsigned long __percpu *percpu_count;
>
> - rcu_read_lock_sched();
> + rcu_read_lock();
>
> if (__ref_is_percpu(ref, &percpu_count))
> this_cpu_sub(*percpu_count, nr);
> else if (unlikely(atomic_long_sub_and_test(nr, &ref->count)))
> ref->release(ref);
>
> - rcu_read_unlock_sched();
> + rcu_read_unlock();
> }
>
> /**
> --
> 2.24.0
>
>

Sorry for sitting on this for so long. I've applied it to for-5.5.

Thanks,
Dennis