Re: [RFC 0/2] srcu: Remove pre-flip memory barrier

From: Mathieu Desnoyers
Date: Sun Dec 18 2022 - 21:04:11 EST


On 2022-12-18 19:24, Joel Fernandes wrote:
On Sun, Dec 18, 2022 at 7:04 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:

Hi Mathieu,

On Sun, Dec 18, 2022 at 6:38 PM Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:

On 2022-12-18 16:30, Joel Fernandes wrote:
Hi Mathieu,

On Sun, Dec 18, 2022 at 3:56 PM Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:

On 2022-12-18 14:13, Joel Fernandes (Google) wrote:
Hello, I believe the pre-flip memory barrier is not required. The only reason I
can say to remove it, other than the possibility that it is unnecessary, is to
not have extra code that does not help. However, since we are issuing a fully
memory-barrier after the flip, I cannot say that it hurts to do it anyway.

For this reason, please consider these patches as "informational", than a
"please merge". :-) Though, feel free to consider merging if you agree!

All SRCU scenarios pass with these, with 6 hours of testing.

Hi Joel,

Please have a look at the comments in my side-rcu implementation [1, 2].
It is similar to what SRCU does (per-cpu counter based grace period
tracking), but implemented for userspace. The comments explain why this
works without the memory barrier you identify as useless in SRCU.

Following my implementation of side-rcu, I reviewed the SRCU comments
and identified that the barrier "/* E */" appears to be useless. I even
discussed this privately with Paul E. McKenney.

My implementation and comments go further though, and skip the period
"flip" entirely if the first pass observes that all readers (in both
periods) are quiescent.

Actually in SRCU, the first pass scans only 1 index, then does the
flip, and the second pass scans the second index. Without doing a
flip, an index cannot be scanned for forward progress reasons because
it is still "active". So I am curious how you can skip flip and still
scan both indexes? I will dig more into your implementation to learn more.

If we look at SRCU read-side:

int __srcu_read_lock(struct srcu_struct *ssp)
{
int idx;

idx = READ_ONCE(ssp->srcu_idx) & 0x1;
this_cpu_inc(ssp->sda->srcu_lock_count[idx]);
smp_mb(); /* B */ /* Avoid leaking the critical section. */
return idx;
}

If the thread is preempted for a long period of time between load of
ssp->srcu_idx and increment of srcu_lock_count[idx], this means this
thread can appear as a "new reader" for the idx period at any arbitrary
time in the future, independently of which period is the current one
within a future grace period.

As a result, the grace period algorithm needs to inherently support the
fact that a "new reader" can appear in any of the two periods,
independently of the current period state.

As a result, this means that while within period "0", we _need_ to allow
newly coming readers to appear as we scan period "0".

Sure, it already does handle it but that is I believe it is a corner
case, not the norm.

As a result, we can simply scan both periods 0/1 for reader quiescence,
even while new readers appear within those periods.

I think this is a bit dangerous. Yes there is the preemption thing you
mentioned above, but that is bounded since you can only have a fixed
number of tasks that underwent that preemption, and it is quite rare
in the sense, each reader should get preempted just after sampling idx
but not incrementing lock count.

However, if we scan while new readers appear (outside of the above
preemption problem), we can have counter wrap causing a false match
much quicker.
The scan loop is:
check_readers(idx) {
count_all_unlocks(idx);
smp_mb();
count_all_locks(idx);
bool done = (locks == unlocks)
if (done) {
// readers are done, end scan for this idx.
} else {
// try again later
}
}

So if check_readers() got preempted just after the smp_mb(), then you
can have lots of tasks enter and exit the read-side critical section
and increment the locks count. Eventually locks == unlocks will
happen, and it is screwed. Sure this is also theoretical, but yeah
that issue can be made "worse" by scanning active readers
deliberately, especially when such readers can also nest arbitrarily.

As a result, flipping between periods 0/1 is just relevant for forward
progress, not for correctness.

Sure, agreed, forward progress.

Adding to the last statement "But also correctness as described above".

Exactly how many entry/exit of the read-side critical section while the grace period is preempted do you need to trigger this ?

On a 64-bit system, where 64-bit counters are used, AFAIU this need to be exactly 2^64 read-side critical sections.

There are other synchronization algorithms such as seqlocks which are quite happy with much less protection against overflow (using a 32-bit counter even on 64-bit architectures).

For practical purposes, I suspect this issue is really just theoretical.

Or am I missing your point ?

Thanks,

Mathieu



thanks,

- Joel

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com