Re: [PATCH RFC] srcu: Yet more detail for srcu_readers_active_idx_check() comments

From: Joel Fernandes
Date: Fri Dec 16 2022 - 22:21:46 EST

Next message: Boqun Feng: "Re: possible deadlock in __ata_sff_interrupt"
Previous message: Joel Fernandes: "Re: [PATCH RFC] srcu: Yet more detail for srcu_readers_active_idx_check() comments"
In reply to: Joel Fernandes: "Re: [PATCH RFC] srcu: Yet more detail for srcu_readers_active_idx_check() comments"
Next in thread: Paul E. McKenney: "Re: [PATCH RFC] srcu: Yet more detail for srcu_readers_active_idx_check() comments"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Dec 16, 2022 at 10:19 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
> Hi,
> On the related subject of this function, I drew a diagram for one of
> the reasons why per-CPU unlock counts have to be scanned first, for a
> particular index, before the per-CPU lock counts, and not the other
> way. Otherwise, a reader that got preempted after reading the index,
> can suddenly get scheduled during the inactive index's scan, and cause
> the total lock and unlock counts to falsely match:
> https://i.imgur.com/79fDWdQ.png

Better diagram: https://i.imgur.com/PXKJnmW.png
(Added the preemption reasoning for Reader 0).

thanks,

- Joel

> Cheers,
>
> - Joel
>
>
>
> On Fri, Dec 16, 2022 at 11:54 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> >
> >
> >
> > > On Dec 16, 2022, at 11:51 AM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Dec 16, 2022 at 04:32:39PM +0000, Joel Fernandes wrote:
> > >> On Thu, Dec 15, 2022 at 05:09:14PM -0800, Paul E. McKenney wrote:
> > >> [...]
> > >>>>>> 2. unlock()'s smp_mb() happened before Flip+smp_mb() , now the reader
> > >>>>>> has no new smp_mb() that happens AFTER the flip happened. So it can
> > >>>>>> totally sample the old idx again -- that particular reader will
> > >>>>>> increment twice, but the next time, it will see the flipped one.
> > >>>>>
> > >>>>> I will let you transliterate both. ;-)
> > >>>>
> > >>>> I think I see what you mean now :)
> > >>>>
> > >>>> I believe the access I am referring to is the read of idx on one side and
> > >>>> the write to idx on the other. However that is incomplete and I need to
> > >>>> pair that with some of other access on both sides.
> > >>>>
> > >>>> So perhaps this:
> > >>>>
> > >>>> Writer does flip + smp_mb + read unlock counts [1]
> > >>>>
> > >>>> Reader does:
> > >>>> read idx + smp_mb() + increment lock counts [2]
> > >>>>
> > >>>> And subsequently reader does
> > >>>> Smp_mb() + increment unlock count. [3]
> > >>>>
> > >>>> So [1] races with either [2] or [2]+[3].
> > >>>>
> > >>>> Is that fair?
> > >>>
> > >>> That does look much better, thank you!
> > >>
> > >> Perhaps a comment with an ASCII diagram will help?
> > >>
> > >>
> > >> Case 2:
> > >> Both the reader and the updater see each other's writes too late, but because
> > >> of memory barriers on both sides, they will eventually see each other's write
> > >> with respect to their own. This is similar to the store-buffer problem. This
> > >> let's a single reader contribute a maximum (unlock minus lock) imbalance of 2.
> > >>
> > >> The following diagram shows the subtle worst case followed by a simplified
> > >> store-buffer explanation.
> > >>
> > >> READER UPDATER
> > >> ------------- ----------
> > >> // idx is initially 0.
> > >> read_lock() {
> > >> READ(idx) = 0;
> > >> lock[0]++; --------------------------------------------,
> > >> flip() { |
> > >> smp_mb(); |
> > >> smp_mb(); |
> > >> } |
> > >> |
> > >> // RSCS |
> > >> |
> > >> read_unlock() { |
> > >> smp_mb(); |
> > >> idx++; // P |
> > >> smp_mb(); |
> > >> } |
> > >> |
> > >> scan_readers_idx(0) { |
> > >> count all unlock[0]; |
> > >> | |
> > >> | |
> > >> unlock[0]++; //X <--not-counted--`-----, |
> > >> | |
> > >> } V `------,
> > >> // Will make sure next scan |
> > >> // will not miss this unlock (X) |
> > >> // if other side saw flip (P) ,--`
> > >> // Call this MB [1] |
> > >> // Order write(idx) with |
> > >> // next scan's unlock. |
> > >> smp_mb(); ,---`
> > >> read_lock() { |
> > >> READ(idx)=0; |
> > >> lock[0]++; ----------------> count all lock[0]; |
> > >> smp_mb(); | } |
> > >> } | | V
> > >> | `---> // Incorrect contribution to lock counting
> > >> | // upto a maximum of 2 times.
> > >> |
> > >> `---> // Pairs with MB [1]. Makes sure that
> > >> // the next read_lock()'s' idx read (Y) is ordered
> > >> // with above write to unlock[0] (X).
> > >> |
> > >> rcu_read_unlock() { |
> > >> smp_mb(); <---------------`
> > >> unlock[0]++;
> > >> }
> > >>
> > >> read_lock() {
> > >> READ(idx) = 1; //Y
> > >> lock[1]++;
> > >> ...
> > >> }
> > >> scan_readers_idx(0) {
> > >> count all unlock[0]; //Q
> > >> ...
> > >>
> > >>
> > >> thanks,
> > >>
> > >> - Joel
> > >>
> > >> }
> > >>
> > >> This makes it similar to the store buffer pattern. Using X, Y, P and Q
> > >> annotated above, we get:
> > >>
> > >> READER UPDATER
> > >> X (write) P (write)
> > >>
> > >> smp_mb(); smp_mb();
> > >>
> > >> Y (read) Q (read)
> > >
> > > Given that this diagram is more than 50 lines long, it might go better in
> > > a design document describing this part of RCU. Perhaps less detail or
> > > segmented, but the same general idea as this guy:
> > >
> > > Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> >
> > Yes, this sounds like a good place to add it and perhaps we refer to it from the C source file? I can take this up to do over the holidays, if you prefer.
> >
> > Thanks,
> >
> > - Joel
> >
> >
> > >
> > > Thoughts?
> > >
> > > Thanx, Paul

Next message: Boqun Feng: "Re: possible deadlock in __ata_sff_interrupt"
Previous message: Joel Fernandes: "Re: [PATCH RFC] srcu: Yet more detail for srcu_readers_active_idx_check() comments"
In reply to: Joel Fernandes: "Re: [PATCH RFC] srcu: Yet more detail for srcu_readers_active_idx_check() comments"
Next in thread: Paul E. McKenney: "Re: [PATCH RFC] srcu: Yet more detail for srcu_readers_active_idx_check() comments"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]