Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

From: Frederic Weisbecker
Date: Mon Jun 08 2020 - 08:57:24 EST


On Thu, Jun 04, 2020 at 09:36:55AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 04, 2020 at 01:41:22PM +0200, Frederic Weisbecker wrote:
> > On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> > > On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > > > Reviewed-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> > >
> > > Thank you for looking this over, Joel!
> > >
> > > Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> > > right thing", even when things are changing? If it is feasible, that
> > > would prevent any number of "interesting" copy-pasta and "just now became
> > > common code" bugs down the road.
> >
> > This won't be pretty:
> >
> > locked = rcu_nocb_lock();
> > rcu_nocb_unlock(locked);
>
> I was thinking in terms of a bit in the rcu_data structure into
> which rcu_nocb_lock() and friends stored the status, and from which
> rcu_nocb_unlock() and friends retrieved that same status. Sort of like
> how preemptible RCU uses the ->rcu_read_lock_nesting field in task_struct.
>
> As noted, this does require reworking the hotplug code to avoid the
> current holding of two such locks concurrently, which I am happy to do
> if that helps.
>
> Or am I missing a subtle (or not-so-subtle) twist here?

So, during a CPU round, the nocb-gp kthread locks the corresponding rdp-nocb
and then ignores if it is not offloaded. Surely there is a smarter ways to do
that though. Well that's something we'll need to optimize at some point anyway.

Also we must then make sure that the nocb timers won't ever fire while we
switch to offloaded state or they may fiddle with internals without locking
nocb.

>
> > And anyway we still want to unconditionally lock on many places,
> > regardless of the offloaded state. I don't know how we could have
> > a magic helper do the unconditional lock on some places and the
> > conditional on others.
>
> I was assuming (perhaps incorrectly) that an intermediate phase between
> not-offloaded and offloaded would take care of all of those cases.

Perhaps partly but I fear that won't be enough.

>
> > Also the point of turning the lock helpers into primitives is to make
> > it clearer as to where we really need unconditional locking and where
> > we allow it to be conditional. A gift to reviewers :-)
>
> Unless and until someone does a copy-pasta, thus unconditionally
> doing the wrong thing. ;-)

Yeah but the cost is more complicated code and synchronization to
maintain the use of those all-in-one locking APIs everywhere. And
current state is already not that simple :-)

Perhaps we should rename rcu_nocb_lock() into rcu_nocb_lock_cond() to
prevent from accidents?

Also I've been thinking that rcu_nocb_lock() should meet any of these
requirements:

* hotplug is locked
* rcu barrier is locked
* rnp is locked

Because checking the offloaded state (when nocb isn't locked yet) of
an rdp without any of the above locks held is racy. And that should
be easy to check and prevent from copy-pasta accidents.

What do you think?