Re: [PATCH rcu 3/3] srcu: Explain why callbacks invocations can't run concurrently

From: Joel Fernandes
Date: Wed Dec 13 2023 - 14:01:45 EST


On Wed, Dec 13, 2023 at 1:55 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Wed, Dec 13, 2023 at 01:35:22PM -0500, Joel Fernandes wrote:
> > On Wed, Dec 13, 2023 at 12:52 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Dec 13, 2023 at 09:27:09AM -0500, Joel Fernandes wrote:
> > > > On Tue, Dec 12, 2023 at 12:48 PM Neeraj Upadhyay (AMD)
> > > > <neeraj.iitr10@xxxxxxxxx> wrote:
> > > > >
> > > > > From: Frederic Weisbecker <frederic@xxxxxxxxxx>
> > > > >
> > > > > If an SRCU barrier is queued while callbacks are running and a new
> > > > > callbacks invocator for the same sdp were to run concurrently, the
> > > > > RCU barrier might execute too early. As this requirement is non-obvious,
> > > > > make sure to keep a record.
> > > > >
> > > > > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> > > > > Reviewed-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> > > > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > > > > Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@xxxxxxxxx>
> > > > > ---
> > > > > kernel/rcu/srcutree.c | 6 ++++++
> > > > > 1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > > > > index 2bfc8ed1eed2..0351a4e83529 100644
> > > > > --- a/kernel/rcu/srcutree.c
> > > > > +++ b/kernel/rcu/srcutree.c
> > > > > @@ -1715,6 +1715,11 @@ static void srcu_invoke_callbacks(struct work_struct *work)
> > > > > WARN_ON_ONCE(!rcu_segcblist_segempty(&sdp->srcu_cblist, RCU_NEXT_TAIL));
> > > > > rcu_segcblist_advance(&sdp->srcu_cblist,
> > > > > rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
> > > > > + /*
> > > > > + * Although this function is theoretically re-entrant, concurrent
> > > > > + * callbacks invocation is disallowed to avoid executing an SRCU barrier
> > > > > + * too early.
> > > > > + */
> > > >
> > > > Side comment:
> > > > I guess even without the barrier reasoning, it is best not to allow
> > > > concurrent CB execution anyway since it diverges from the behavior of
> > > > straight RCU :)
> > >
> > > Good point!
> > >
> > > But please do not forget item 12 on the list in checklist.rst. ;-)
> > > (Which I just updated to include the other call_rcu*() functions.)
> >
> > I think this is more so now with recent kernels (with the dynamic nocb
> > switch) than with older kernels right? I haven't kept up with the
> > checklist recently (which is my bad).
>
> You are quite correct! But even before this, I was saying that
> lack of same-CPU callback concurrency was an accident of the current
> implementation rather than a guarantee. For example, there might come
> a time when RCU needs to respond to callback flooding with concurrent
> execution of the flooded CPU's callbacks. Or not, but we do need to
> keep this option open.

Got it, reminds me to focus on requirements as well along with implementation.

> > My understanding comes from the fact that the RCU barrier depends on
> > callbacks on the same CPU executing in order with straight RCU
> > otherwise it breaks. Hence my comment. But as you pointed out, that's
> > outdated knowledge.
>
> That is still one motivation for ordered execution of callbacks. For the
> dynamic nocb switch, we could have chosen to make rcu_barrier() place
> a callback on both lists, but we instead chose to exclude rcu_barrier()
> calls during the switch.

Right!

> > I should just shut up and hide in shame now.
>
> No need for that! After all, one motivation for Requirements.rst was
> to help me keep track of all this stuff.

Thanks!

- Joel