RE: [PATCH] srcu: Add detection of boot CPU srcu_data structure's->srcu_cblist

From: Zhang, Qiang1
Date: Thu Nov 24 2022 - 00:59:37 EST


On Thu, Nov 24, 2022 at 01:43:42AM +0000, Zhang, Qiang1 wrote:
> On Wed, Nov 23, 2022 at 02:01:50AM +0000, Zhang, Qiang1 wrote:
> >
> > >On Thu, Nov 17, 2022 at 07:20:50PM +0800, Zqiang wrote:
> > > Before SRCU_SIZE_WAIT_CALL srcu_size_state is reached, the srcu
> > > callback only insert to boot-CPU srcu_data structure's->srcu_cblist
> > > and other CPU srcu_data structure's->srcu_cblist is always empty. so
> > > before SRCU_SIZE_WAIT_CALL is reached, need to correctly check boot-CPU
> > > pend cbs in srcu_might_be_idle().
> > >
> > > Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
> > > ---
> > > kernel/rcu/srcutree.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > > index 6af031200580..6d9af9901765 100644
> > > --- a/kernel/rcu/srcutree.c
> > > +++ b/kernel/rcu/srcutree.c
> > > @@ -1098,8 +1098,11 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
> > > unsigned long tlast;
> > >
> > > check_init_srcu_struct(ssp);
> > > - /* If the local srcu_data structure has callbacks, not idle. */
> > > - sdp = raw_cpu_ptr(ssp->sda);
> > > + /* If the boot CPU or local srcu_data structure has callbacks, not idle. */
> > > + if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_CALL)
> > > + sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
> > > + else
> > > + sdp = raw_cpu_ptr(ssp->sda);
> > >
> >
> > Hi Paul,
> >
> > For the convert_to_big default configuration(SRCU_SIZING_AUTO), I found that the srcu is in
> > SRCU_SIZE_SMALL state most of the time in the system, I think this change is meaningful.
> >
> > Thoughts ?
>
> >You are right that this change might make srcu_might_be_idle() return a
> >more accurate value in the common case. As things are now, some other
> >CPU might just now have added a callback, but might not yet have started
> >that new grace period. In that case, we might expedite an SRCU grace
> >period when we would not have otherwise done so. However, this change
> >would also increase contention on the get_boot_cpu_id() CPU's srcu_data
> >structure's ->lock.
> >
> >So the result of that inaccurate return value is that the first two SRCU
> >grace periods in a burst might be expedited instead of only the first one,
> >and even then only if we hit a very narrow race window.
> >
> >Besides, this same thing can happen if two CPUs do synchronize_srcu()
> >at the same time after a long-enough pause between grace periods.
> >Both see no callbacks, so both ask for an expedited grace period.
> >So again, the first two grace periods are expedited.
> >
> >As a result, I am having a very hard time justifying the increased
> >lock contention.
>
> Thanks reply, I have another question, Is this srcu_data structure's->lock necessary?
> the rcu_segcblist_pend_cbs() only check *tails[RCU_DONE_TAIL] is empty.
> or can use rcu_segcblist_get_seglen(RCU_WAIT_TAIL + RCU_NEXT_READY_TAIL + RCU_NEXT_TAIL)
> instead of rcu_segcblist_pend_cbs() to avoid locking? (although this is also inaccurate)

>That extra "*" means that the lock must be acquired. Otherwise, the
>pointed-to callback might even be unmapped between the time we fetched
>the pointer and the time we dereferenced it.

Thanks for detailed explanation , learn more😊.

>
> Thanx, Paul.

> Thanks
> Zqiang
>
> >
> >Am I missing something here?
> >
> > Thanx, Paul
> >
> > Thanks
> > Zqiang
> >
> >
> > >While at it if someone is interested in documenting/commenting on the meaning of
> > >all these SRCU_SIZE_* things, it would be much appreciated!
> > >
> > >In the initial stage(SRCU_SIZE_SMALL), there is no srcu_node structures, only initialized
> > >per-CPU srcu_data structures, at this time, we only use boot-cpu srcu_data structures's ->srcu_cblist
> > >to store srcu callbacks.
> > >if the contention of srcu_struct and srcu_data structure's->lock become busy,
> > >transition to SRCU_SIZE_ALLOC. allocated memory for srcu_node structure at end of the SRCU
> > >grace period.
> > >if allocated success, transition to SRCU_SIZE_WAIT_BARRIER, in this state, invoke
> > >srcu_barrier() will iterate over all possible CPUs, but we still only use boot-CPU srcu_cblist to store callbacks.
> > >the task calling call_srcu() may have access to a new srcu_node structure or may not,
> > >because call_srcu() is protected by SRCU, we may not transition to SRCU_SIZE_WAIT_CALL quickly,
> > >need to wait in this state for a SRCU grace period to end.
> > >After transition to SRCU_SIZE_WAIT_CALL, we enqueue srcu callbacks to own srcu_data structures's ->srcu_cblist.
> > >
> > >Does my description make my commit more detailed?
> > >
> > >Thanks
> > >Zqiang
> > >
> > >
> >
> >
> > >
> > >Thanks.
> > >
> > > spin_lock_irqsave_rcu_node(sdp, flags);
> > > if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) {
> > > spin_unlock_irqrestore_rcu_node(sdp, flags);
> > > --
> > > 2.25.1
> > >