Re: [RFC 1/2] srcu: Remove comment about prior read lock counts

From: Joel Fernandes
Date: Sun Dec 18 2022 - 16:19:47 EST


Hi Mathieu,
Thanks a lot for sharing details of side-rcu, I will review it over
the holidays! I agree we can/should borrow from that implementation
where possible. I replied to the comments below:

On Sun, Dec 18, 2022 at 4:07 PM Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> On 2022-12-18 14:13, Joel Fernandes (Google) wrote:
> > The comment says that if an updater saw lock count updates, then
> > ensure the reader does not see the new srcu_idx.
> >
> > However, there is no memory barrier between a READER reading srcu_idx
> > with respect to incrementing the lock count for that srcu_idx.
> >
> > So what is really happening is, both "B" and "C" will order the current
> > reader's unlock count update, and the _next_ readers lock count update, with
> > respect to the write to the currently active index.
> >
> > Consider first the case of the unlock count update being seen by the UPDATER:
> >
> > (for brevity, the pseudocode shortens "srcu_idx" to "idx")
> >
> > READER UPDATER
> >
> > rcu_read_lock() {
> > idx = READ(idx);
> > lock_count[idx]++;
> >
> > smp_mb(); // B
> > }
> > srcu_flip() {
> > smp_mb(); //E
> > idx++;
> > smp_mb();
> > }
> > rcu_read_unlock() {
> > smp_mb(); // C
> > unlock_count[idx]++;
> > }
> >
> > Consider that the updater saw the unlock count update, and due to this, we
> > expect "E" to make sure that the reader only used the old srcu_idx.
> >
> > However, say the reader used the new srcu_idx because we dropped "E". That is
> > totally OK because both unlock and lock counts of this reader will negate each
> > other during the next scan of the srcu_idx. So we don't have to guarantee at
> > all that the reader used the old srcu_idx, that does not buy us anything
> > because if it used the new one, we would just ignore it during the next scan
> > anyway (the reader is "done").
> >
> > Now lets look at the following case:
> >
> > READER UPDATER
> >
> > rcu_read_lock() {
> > idx = READ(idx);
> > lock_count[idx]++;
> >
> > smp_mb(); // B
> > }
> >
> > rcu_read_unlock() {
> > smp_mb(); // C
> > unlock_count[idx]++;
> > }
> > srcu_flip() {
> > smp_mb(); //E
> > idx++;
> > rcu_read_lock() {
> > idx = READ(idx);
> > lock_count[idx]++;
> >
> > smp_mb(); // B
> > smp_mb();
> > }
> > }
> >
> > Consider that the updater saw the lock count update of the second
> > rcu_read_lock(). It does not matter that we guarantee that the reader sees only
> > the old srcu_idx. This is because, a reader could totally just sample
> > srcu_idx, and stay preempted for long periods of time. So, during any scan, we
> > already have the issue of a preempted-reader randomly springing up with a copy
> > of the index which we consider the "new index". So guaranteeing that the reader
> > saw the old srcu_idx instead of the new one if we saw its lock count updates,
> > also does not buy us anything.
> >
> > Due to these reasons, drop the argument that the reader has to see a certain
> > srcu_idx since we have no control over that anyway, and guaranteeing that does not
> > buy us anything.
>
> I don't understand why this first patch only removes a comment about the
> need to order things, when in fact it's the entire memory barrier /* E
> */ that is useless.
>
> I suspect we should just remove the comment along with the barrier
> without this added step.

Yes, this is what I do in patch 2/2. I just wanted to break it down
because the change log would be too big otherwise. There were 2
reasons given for the "E" memory barrier, so I split it into 2 patches
for that reason so both patches can be discussed possibly separately.
But no issues with squashing as well and merging change logs.

> What SRCU fundamentally does is detect quiescence of all SRCU readers
> between the beginning of the grace period and its completion. In order
> to ensure forward progress, it does so in a two-phase algorithm. What
> the grace period does to detect quiescence is to observe that each of
> the periods (0/1) have no active reader at a given point in the grace
> period.

Yes agreed! By deactivating an index, it is possible to ensure new
readers don't show up on the deactivated one, and so we are bound in
scan time / forward progress. (Except for the special case where a
reader was preempted after picking up the index, and then suddenly
shows up in the scan).

> Then the fact that the period is flipped to send new-coming
> readers into a different period is just to ensure forward progress, and
> is basically irrelevant (order-wise) with the fact that the grace period
> scans both periods and validates that both periods are observed to have
> no active readers.

Yes.

> I'd favor just clarifying the SRCU model in this way, and then remove
> the useless barrier, and also implement improvements to the SRCU
> algorithm and skip the "flip" entirely when we notice (early) that no
> active readers are present in both periods. This is actually similar to
> what I implemented in side-rcu.
> Thoughts ?

Agreed! Thanks so much for sharing this, and I will take a look on my
side and study it. Interesting on that optimization to skip the flip!

Thanks,

- Joel


>
> Thanks,
>
> Mathieu
>
> >
> > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> > ---
> > kernel/rcu/srcutree.c | 10 ++++------
> > 1 file changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 1c304fec89c0..d6a4c2439ca6 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -983,12 +983,10 @@ static bool try_check_zero(struct srcu_struct *ssp, int idx, int trycount)
> > static void srcu_flip(struct srcu_struct *ssp)
> > {
> > /*
> > - * Ensure that if this updater saw a given reader's increment
> > - * from __srcu_read_lock(), that reader was using an old value
> > - * of ->srcu_idx. Also ensure that if a given reader sees the
> > - * new value of ->srcu_idx, this updater's earlier scans cannot
> > - * have seen that reader's increments (which is OK, because this
> > - * grace period need not wait on that reader).
> > + * Ensure that if a given reader sees the new value of ->srcu_idx, this
> > + * updater's earlier scans cannot have seen that reader's increments
> > + * (which is OK, because this grace period need not wait on that
> > + * reader).
> > */
> > smp_mb(); /* E */ /* Pairs with B and C. */
> >
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>