Re: [tip: locking/core] lockdep: Fix lockdep recursion

From: Paul E. McKenney
Date: Wed Oct 14 2020 - 23:41:32 EST


On Wed, Oct 14, 2020 at 04:55:53PM -0700, Paul E. McKenney wrote:
> On Thu, Oct 15, 2020 at 12:39:54AM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 14, 2020 at 03:11:52PM -0700, Paul E. McKenney wrote:
> > > On Wed, Oct 14, 2020 at 11:53:19PM +0200, Peter Zijlstra wrote:
> > > > On Wed, Oct 14, 2020 at 11:34:05AM -0700, Paul E. McKenney wrote:
> > > > > commit 7deaa04b02298001426730ed0e6214ac20d1a1c1
> > > > > Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > > > > Date: Tue Oct 13 12:39:23 2020 -0700
> > > > >
> > > > > rcu: Prevent lockdep-RCU splats on lock acquisition/release
> > > > >
> > > > > The rcu_cpu_starting() and rcu_report_dead() functions transition the
> > > > > current CPU between online and offline state from an RCU perspective.
> > > > > Unfortunately, this means that the rcu_cpu_starting() function's lock
> > > > > acquisition and the rcu_report_dead() function's lock releases happen
> > > > > while the CPU is offline from an RCU perspective, which can result in
> > > > > lockdep-RCU splats about using RCU from an offline CPU. In reality,
> > > > > aside from the splats, both transitions are safe because a new grace
> > > > > period cannot start until these functions release their locks.
> > > >
> > > > But we call the trace_* crud before we acquire the lock. Are you sure
> > > > that's a false-positive?
> > >
> > > You lost me on this one.
> > >
> > > I am assuming that you are talking about rcu_cpu_starting(), because
> > > that is the one where RCU is not initially watching, that is, the
> > > case where tracing before the lock acquisition would be a problem.
> > > You cannot be talking about rcu_cpu_starting() itself, because it does
> > > not do any tracing before acquiring the lock. But if you are talking
> > > about the caller of rcu_cpu_starting(), then that caller should put the
> > > rcu_cpu_starting() before the tracing. But that would be the other
> > > patch earlier in this thread that was proposing moving the call to
> > > rcu_cpu_starting() much earlier in CPU bringup.
> > >
> > > So what am I missing here?
> >
> > rcu_cpu_starting();
> > raw_spin_lock_irqsave();
> > local_irq_save();
> > preempt_disable();
> > spin_acquire()
> > lock_acquire()
> > trace_lock_acquire() <--- *whoopsie-doodle*
> > /* uses RCU for tracing */
> > arch_spin_lock_flags() <--- the actual spinlock
>
> Gah! Idiot here left out the most important part, so good catch!!!
> Much easier this way than finding out about it the hard way...
>
> I should have asked myself harder questions earlier today about moving
> the counter from the rcu_node structure to the rcu_data structure.
>
> Perhaps something like the following untested patch on top of the
> earlier patch?

Except that this is subtlely flawed also. The delay cannot be at
rcu_gp_cleanup() time because by the time we are working on the last
leaf rcu_node structure, callbacks might already have started being
invoked on CPUs corresponding to the earlier leaf rcu_node structures.

So the (untested) patch below (on top of the other two) moves the delay
to rcu_gp_init(), in particular, to the first loop that traverses only
the leaf rcu_node structures handling CPU hotplug.

Hopefully getting closer!

Oh, and the second smp_mb() added to rcu_gp_init() is probably
redundant given the full barrier implied by the later call to
raw_spin_lock_irq_rcu_node(). But one thing at a time...

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8b5215e..5904b63 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1725,6 +1725,7 @@ static void rcu_strict_gp_boundary(void *unused)
*/
static bool rcu_gp_init(void)
{
+ unsigned long firstseq;
unsigned long flags;
unsigned long oldmask;
unsigned long mask;
@@ -1768,6 +1769,12 @@ static bool rcu_gp_init(void)
*/
rcu_state.gp_state = RCU_GP_ONOFF;
rcu_for_each_leaf_node(rnp) {
+ smp_mb(); // Pair with barriers used when updating ->ofl_seq to odd values.
+ firstseq = READ_ONCE(rnp->ofl_seq);
+ if (firstseq & 0x1)
+ while (firstseq == smp_load_acquire(&rnp->ofl_seq))
+ schedule_timeout_idle(1); // Can't wake unless RCU is watching.
+ smp_mb(); // Pair with barriers used when updating ->ofl_seq to even values.
raw_spin_lock(&rcu_state.ofl_lock);
raw_spin_lock_irq_rcu_node(rnp);
if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
@@ -1982,7 +1989,6 @@ static void rcu_gp_fqs_loop(void)
static void rcu_gp_cleanup(void)
{
int cpu;
- unsigned long firstseq;
bool needgp = false;
unsigned long gp_duration;
unsigned long new_gp_seq;
@@ -2020,12 +2026,6 @@ static void rcu_gp_cleanup(void)
new_gp_seq = rcu_state.gp_seq;
rcu_seq_end(&new_gp_seq);
rcu_for_each_node_breadth_first(rnp) {
- smp_mb(); // Pair with barriers used when updating ->ofl_seq to odd values.
- firstseq = READ_ONCE(rnp->ofl_seq);
- if (firstseq & 0x1)
- while (firstseq == smp_load_acquire(&rnp->ofl_seq))
- schedule_timeout_idle(1); // Can't wake unless RCU is watching.
- smp_mb(); // Pair with barriers used when updating ->ofl_seq to even values.
raw_spin_lock_irq_rcu_node(rnp);
if (WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)))
dump_blkd_tasks(rnp, 10);