Re: [PATCH v6 1/4] rcu: Make call_rcu() lazy to save power

From: Frederic Weisbecker
Date: Sat Sep 24 2022 - 18:53:05 EST


On Thu, Sep 22, 2022 at 10:01:01PM +0000, Joel Fernandes (Google) wrote:
> @@ -3902,7 +3939,11 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
> rdp->barrier_head.func = rcu_barrier_callback;
> debug_rcu_head_queue(&rdp->barrier_head);
> rcu_nocb_lock(rdp);
> - WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> + /*
> + * Flush the bypass list, but also wake up the GP thread as otherwise
> + * bypass/lazy CBs maynot be noticed, and can cause real long delays!
> + */
> + WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, FLUSH_BP_WAKE));

This fixes an issue that goes beyond lazy implementation. It should be done
in a separate patch, handling rcu_segcblist_entrain() as well, with "Fixes: " tag.

And then FLUSH_BP_WAKE is probably not needed anymore.

> if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
> atomic_inc(&rcu_state.barrier_cpu_count);
> } else {
> @@ -269,10 +294,14 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
> raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
>
> /*
> - * Bypass wakeup overrides previous deferments. In case
> - * of callback storm, no need to wake up too early.
> + * Bypass wakeup overrides previous deferments. In case of
> + * callback storm, no need to wake up too early.
> */
> - if (waketype == RCU_NOCB_WAKE_BYPASS) {
> + if (waketype == RCU_NOCB_WAKE_LAZY
> + && READ_ONCE(rdp->nocb_defer_wakeup) == RCU_NOCB_WAKE_NOT) {

This can be a plain READ since ->nocb_defer_wakeup is only written under ->nocb_gp_lock.

> + mod_timer(&rdp_gp->nocb_timer, jiffies + jiffies_till_flush);
> + WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> + } else if (waketype == RCU_NOCB_WAKE_BYPASS) {
> mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
> WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> } else {
> @@ -512,9 +598,16 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
> }
> // Need to actually to a wakeup.
> len = rcu_segcblist_n_cbs(&rdp->cblist);
> + bypass_len = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> + lazy_len = READ_ONCE(rdp->lazy_len);
> if (was_alldone) {
> rdp->qlen_last_fqs_check = len;
> - if (!irqs_disabled_flags(flags)) {
> + // Only lazy CBs in bypass list
> + if (lazy_len && bypass_len == lazy_len) {
> + rcu_nocb_unlock_irqrestore(rdp, flags);
> + wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY,
> + TPS("WakeLazy"));

I'm trying to think of a case where rcu_nocb_try_bypass() returns false
(queue to regular list) but then call_rcu() -> __call_rcu_nocb_wake() ends up
seeing a lazy bypass queue even though we are queueing a non-lazy callback
(should have flushed in this case).

Looks like it shouldn't happen, even with concurrent (de-offloading) but just
in case, can we add:

WARN_ON_ONCE(lazy_len != len)

> + } else if (!irqs_disabled_flags(flags)) {
> /* ... if queue was empty ... */
> rcu_nocb_unlock_irqrestore(rdp, flags);
> wake_nocb_gp(rdp, false);

Thanks.