Re: [PATCH RFC] srcu: Yet more detail for srcu_readers_active_idx_check() comments

From: Joel Fernandes
Date: Sat Dec 17 2022 - 01:32:58 EST




> On Dec 17, 2022, at 12:15 AM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Fri, Dec 16, 2022 at 10:21:25PM -0500, Joel Fernandes wrote:
>>> On Fri, Dec 16, 2022 at 10:19 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>>>
>>> Hi,
>>> On the related subject of this function, I drew a diagram for one of
>>> the reasons why per-CPU unlock counts have to be scanned first, for a
>>> particular index, before the per-CPU lock counts, and not the other
>>> way. Otherwise, a reader that got preempted after reading the index,
>>> can suddenly get scheduled during the inactive index's scan, and cause
>>> the total lock and unlock counts to falsely match:
>>> https://i.imgur.com/79fDWdQ.png
>>
>> Better diagram: https://i.imgur.com/PXKJnmW.png
>> (Added the preemption reasoning for Reader 0).
>
> Nice!!!

Thanks!

> The other way to look at this is using a memory-ordering viewpoint.
> This is a member of the message-passing litmus-test family, and the reader
> must read the variables in the opposite order that the writer writes them.

Exactly, thanks. If we read the unlock counts first, which includes the unlock count of that spurious reader 0 in my figure, then would have already seen the reader 0’s lock count when we sum up the lock counts, thanks to MP pattern - so that spurious reader cannot do any damage or imbalance in the delta between the lock and unlock counts. In MP pattern, if we see the flag, then we would have already seen the data (message). And on the other CPU the data is written first followed by the write to the flag. Which is exactly what we do in the SRCU as you mentioned with the reversal of the ordering of the variables we read. I will reason about it this way as well in my notes, thank you!

The other way I reason is, during scanning, if we count unlocks first before locks, it’s OK to miss some unlocks, as that just means we will falsely delay ending of GP. However if we count locks first and then unlocks, we are screwed if we miss counting a lock, because if we count the unlock count for that missed lock, then we might end the GP sooner than we should!!!

Thanks,

- Joel


>
> (See the infamous test6.pdf file, "MP" pattern.)
>
> Thanx, Paul
>
>> thanks,
>>
>> - Joel
>>
>>
>>> Cheers,
>>>
>>> - Joel
>>>
>>>
>>>
>>> On Fri, Dec 16, 2022 at 11:54 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>>>>
>>>>
>>>>
>>>>> On Dec 16, 2022, at 11:51 AM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>>>>>
>>>>> On Fri, Dec 16, 2022 at 04:32:39PM +0000, Joel Fernandes wrote:
>>>>>> On Thu, Dec 15, 2022 at 05:09:14PM -0800, Paul E. McKenney wrote:
>>>>>> [...]
>>>>>>>>>> 2. unlock()'s smp_mb() happened before Flip+smp_mb() , now the reader
>>>>>>>>>> has no new smp_mb() that happens AFTER the flip happened. So it can
>>>>>>>>>> totally sample the old idx again -- that particular reader will
>>>>>>>>>> increment twice, but the next time, it will see the flipped one.
>>>>>>>>>
>>>>>>>>> I will let you transliterate both. ;-)
>>>>>>>>
>>>>>>>> I think I see what you mean now :)
>>>>>>>>
>>>>>>>> I believe the access I am referring to is the read of idx on one side and
>>>>>>>> the write to idx on the other. However that is incomplete and I need to
>>>>>>>> pair that with some of other access on both sides.
>>>>>>>>
>>>>>>>> So perhaps this:
>>>>>>>>
>>>>>>>> Writer does flip + smp_mb + read unlock counts [1]
>>>>>>>>
>>>>>>>> Reader does:
>>>>>>>> read idx + smp_mb() + increment lock counts [2]
>>>>>>>>
>>>>>>>> And subsequently reader does
>>>>>>>> Smp_mb() + increment unlock count. [3]
>>>>>>>>
>>>>>>>> So [1] races with either [2] or [2]+[3].
>>>>>>>>
>>>>>>>> Is that fair?
>>>>>>>
>>>>>>> That does look much better, thank you!
>>>>>>
>>>>>> Perhaps a comment with an ASCII diagram will help?
>>>>>>
>>>>>>
>>>>>> Case 2:
>>>>>> Both the reader and the updater see each other's writes too late, but because
>>>>>> of memory barriers on both sides, they will eventually see each other's write
>>>>>> with respect to their own. This is similar to the store-buffer problem. This
>>>>>> let's a single reader contribute a maximum (unlock minus lock) imbalance of 2.
>>>>>>
>>>>>> The following diagram shows the subtle worst case followed by a simplified
>>>>>> store-buffer explanation.
>>>>>>
>>>>>> READER UPDATER
>>>>>> ------------- ----------
>>>>>> // idx is initially 0.
>>>>>> read_lock() {
>>>>>> READ(idx) = 0;
>>>>>> lock[0]++; --------------------------------------------,
>>>>>> flip() { |
>>>>>> smp_mb(); |
>>>>>> smp_mb(); |
>>>>>> } |
>>>>>> |
>>>>>> // RSCS |
>>>>>> |
>>>>>> read_unlock() { |
>>>>>> smp_mb(); |
>>>>>> idx++; // P |
>>>>>> smp_mb(); |
>>>>>> } |
>>>>>> |
>>>>>> scan_readers_idx(0) { |
>>>>>> count all unlock[0]; |
>>>>>> | |
>>>>>> | |
>>>>>> unlock[0]++; //X <--not-counted--`-----, |
>>>>>> | |
>>>>>> } V `------,
>>>>>> // Will make sure next scan |
>>>>>> // will not miss this unlock (X) |
>>>>>> // if other side saw flip (P) ,--`
>>>>>> // Call this MB [1] |
>>>>>> // Order write(idx) with |
>>>>>> // next scan's unlock. |
>>>>>> smp_mb(); ,---`
>>>>>> read_lock() { |
>>>>>> READ(idx)=0; |
>>>>>> lock[0]++; ----------------> count all lock[0]; |
>>>>>> smp_mb(); | } |
>>>>>> } | | V
>>>>>> | `---> // Incorrect contribution to lock counting
>>>>>> | // upto a maximum of 2 times.
>>>>>> |
>>>>>> `---> // Pairs with MB [1]. Makes sure that
>>>>>> // the next read_lock()'s' idx read (Y) is ordered
>>>>>> // with above write to unlock[0] (X).
>>>>>> |
>>>>>> rcu_read_unlock() { |
>>>>>> smp_mb(); <---------------`
>>>>>> unlock[0]++;
>>>>>> }
>>>>>>
>>>>>> read_lock() {
>>>>>> READ(idx) = 1; //Y
>>>>>> lock[1]++;
>>>>>> ...
>>>>>> }
>>>>>> scan_readers_idx(0) {
>>>>>> count all unlock[0]; //Q
>>>>>> ...
>>>>>>
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> - Joel
>>>>>>
>>>>>> }
>>>>>>
>>>>>> This makes it similar to the store buffer pattern. Using X, Y, P and Q
>>>>>> annotated above, we get:
>>>>>>
>>>>>> READER UPDATER
>>>>>> X (write) P (write)
>>>>>>
>>>>>> smp_mb(); smp_mb();
>>>>>>
>>>>>> Y (read) Q (read)
>>>>>
>>>>> Given that this diagram is more than 50 lines long, it might go better in
>>>>> a design document describing this part of RCU. Perhaps less detail or
>>>>> segmented, but the same general idea as this guy:
>>>>>
>>>>> Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
>>>>
>>>> Yes, this sounds like a good place to add it and perhaps we refer to it from the C source file? I can take this up to do over the holidays, if you prefer.
>>>>
>>>> Thanks,
>>>>
>>>> - Joel
>>>>
>>>>
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Thanx, Paul