Re: One potential issue with concurrent execution of RCU callbacks...

From: Paul E. McKenney
Date: Wed Dec 09 2020 - 19:52:08 EST


On Wed, Dec 09, 2020 at 10:14:49AM +0800, Boqun Feng wrote:
> Hi Frederic,
>
> On Tue, Dec 08, 2020 at 11:04:38PM +0100, Frederic Weisbecker wrote:
> > On Tue, Dec 08, 2020 at 10:24:09AM -0800, Paul E. McKenney wrote:
> > > > It reduces the code scope running with BH disabled.
> > > > Also narrowing down helps to understand what it actually protects.
> > >
> > > I thought that you would call out unnecessarily delaying other softirq
> > > handlers. ;-)
> > >
> > > But if such delays are a problem (and they might well be), then to
> > > avoid them on non-rcu_nocb CPUs would instead/also require changing the
> > > early-exit checks to check for other pending softirqs to the existing
> > > checks involving time, need_resched, and idle. At which point, entering and
> > > exiting BH-disabled again doesn't help, other than your point about the
> > > difference in BH-disabled scopes on rcu_nocb and non-rcu_nocb CPUs.
> >
> > Wise observation!
> >
> > >
> > > Would it make sense to exit rcu_do_batch() if more than some amount
> > > of time had elapsed and there was some non-RCU softirq pending?
> > >
> > > My guess is that the current tlimit checks in rcu_do_batch() make this
> > > unnecessary.
> >
> > Right and nobody has complained about it so far.
> >
> > But I should add a comment explaining the reason for the BH-disabled
> > section in my series.
> >
>
> Some background for the original question: I'm revisiting the wait
> context checking feature of lockdep (which can detect bugs like
> acquiring a spinlock_t lock inside a raw_spinlock_t), I've post my first
> version:
>
> https://lore.kernel.org/lkml/20201208103112.2838119-1-boqun.feng@xxxxxxxxx/
>
> , and will surely copy you in the next version ;-)
>
> The reason I asked for the RCU callback context requirement is that we
> have the virtual lock (rcu_callback_map) that marks a RCU callback
> context, so if RCU callback contexts have special restrictions on the
> locking usage inside, we can use the wait context checking to do the
> check (like what I did in the patch #3 of the above series).
>
> My current summary is that since in certain configs (use_softirq is true
> and nocb is disabled) RCU callbacks are executed in a softirq context,
> so the least requirement for any RCU callbacks is they need to obey the
> rules in softirq contexts. And yes, I'm aware that in some configs, RCU
> callbacks are not executed in a softirq context (sometimes, even the BH
> is not disabled), but we need to make all the callback work in the
> "worst" (or strictest) case (callbacks executing in softirq contexts).
> Currently, the effect of using wait context for rcu_callback_map in my
> patchset is that lockdep will complain if a RCU callback use a mutex or
> other sleepable locks, but using spinlock_t (even in PREEMPT_RT) won't
> cause lockdep to complain. Am I getting this correct?

It matches what I know. And yes, in PREEMPT_RT, softirq is preemptible,
allowing spinlock_t to be used, but there are restrictions that
lopckdep enforces. I am not going to claim to remember the exact set
of restrictions.

Thanx, Paul