Re: [PATCH v8 01/13] rcu: Fix missing nocb gp wake on rcu_barrier()

From: Paul E. McKenney
Date: Fri Oct 14 2022 - 10:21:42 EST


On Tue, Oct 11, 2022 at 06:01:30PM +0000, Joel Fernandes (Google) wrote:
> From: Frederic Weisbecker <frederic@xxxxxxxxxx>
>
> Upon entraining a callback to a NOCB CPU, no further wake up is
> issued on the corresponding nocb_gp kthread. As a result, the callback
> and all the subsequent ones on that CPU may be ignored, at least until
> an RCU_NOCB_WAKE_FORCE timer is ever armed or another NOCB CPU belonging
> to the same group enqueues a callback on an empty queue.
>
> Here is a possible bad scenario:
>
> 1) CPU 0 is NOCB unlike all other CPUs.
> 2) CPU 0 queues a callback

Call it CB1.

> 2) The grace period related to that callback elapses
> 3) The callback is moved to the done list (but is not invoked yet),
> there are no more pending callbacks for CPU 0

So CB1 is on ->cblist waiting to be invoked, correct?

> 4) CPU 1 calls rcu_barrier() and sends an IPI to CPU 0
> 5) CPU 0 entrains the callback but doesn't wake up nocb_gp

And CB1 must still be there because otherwise the IPI handler would not
have entrained the callback, correct? If so, we have both CB1 and the
rcu_barrier() callback (call it CB2) in ->cblist, but on the done list.

> 6) CPU 1 blocks forever, unless CPU 0 ever queues enough further
> callbacks to arm an RCU_NOCB_WAKE_FORCE timer.

Except that -something- must have already been prepared to wake up in
order to invoke CB1. And that something would invoke CB2 along with CB1,
given that they are both on the done list. If there is no such wakeup
already, then the hang could occur with just CB1, without the help of CB2.

> Make sure the necessary wake up is produced whenever necessary.

I am not seeing that the wakeup is needed in this case.

So what am I missing here?

> This is also required to make sure lazy callbacks in future patches
> don't end up making rcu_barrier() wait for multiple seconds.

But I do see that the wakeup is needed in the lazy case, and if I remember
correctly, the ten-second rcu_barrier() delay really did happen. If I
understand correctly, for this to happen, all of the callbacks must be
in the bypass list, that is, ->cblist must be empty.

So has the scenario steps 1-6 called out above actually happened in the
absence of lazy callbacks?

Thanx, Paul

> Reported-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs")
> Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> ---
> kernel/rcu/tree.c | 6 ++++++
> kernel/rcu/tree.h | 1 +
> kernel/rcu/tree_nocb.h | 5 +++++
> 3 files changed, 12 insertions(+)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 5ec97e3f7468..dc1c502216c7 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3894,6 +3894,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
> {
> unsigned long gseq = READ_ONCE(rcu_state.barrier_sequence);
> unsigned long lseq = READ_ONCE(rdp->barrier_seq_snap);
> + bool wake_nocb = false;
> + bool was_alldone = false;
>
> lockdep_assert_held(&rcu_state.barrier_lock);
> if (rcu_seq_state(lseq) || !rcu_seq_state(gseq) || rcu_seq_ctr(lseq) != rcu_seq_ctr(gseq))
> @@ -3902,6 +3904,7 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
> rdp->barrier_head.func = rcu_barrier_callback;
> debug_rcu_head_queue(&rdp->barrier_head);
> rcu_nocb_lock(rdp);
> + was_alldone = rcu_rdp_is_offloaded(rdp) && !rcu_segcblist_pend_cbs(&rdp->cblist);
> WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
> atomic_inc(&rcu_state.barrier_cpu_count);
> @@ -3909,7 +3912,10 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
> debug_rcu_head_unqueue(&rdp->barrier_head);
> rcu_barrier_trace(TPS("IRQNQ"), -1, rcu_state.barrier_sequence);
> }
> + wake_nocb = was_alldone && rcu_segcblist_pend_cbs(&rdp->cblist);
> rcu_nocb_unlock(rdp);
> + if (wake_nocb)
> + wake_nocb_gp(rdp, false);
> smp_store_release(&rdp->barrier_seq_snap, gseq);
> }
>
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index d4a97e40ea9c..925dd98f8b23 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -439,6 +439,7 @@ static void zero_cpu_stall_ticks(struct rcu_data *rdp);
> static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
> static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
> static void rcu_init_one_nocb(struct rcu_node *rnp);
> +static bool wake_nocb_gp(struct rcu_data *rdp, bool force);
> static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> unsigned long j);
> static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index f77a6d7e1356..094fd454b6c3 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1558,6 +1558,11 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
> {
> }
>
> +static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
> +{
> + return false;
> +}
> +
> static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> unsigned long j)
> {
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>