Re: [RFC 0/2] srcu: Remove pre-flip memory barrier

From: Mathieu Desnoyers
Date: Wed Dec 21 2022 - 12:11:45 EST


On 2022-12-21 06:59, Frederic Weisbecker wrote:
On Tue, Dec 20, 2022 at 10:34:19PM -0500, Mathieu Desnoyers wrote:
[...]

The memory ordering constraint I am concerned about here is:

* [...] In addition,
* each CPU having an SRCU read-side critical section that extends beyond
* the return from synchronize_srcu() is guaranteed to have executed a
* full memory barrier after the beginning of synchronize_srcu() and before
* the beginning of that SRCU read-side critical section. [...]

So if we have a SRCU read-side critical section that begins after the beginning
of synchronize_srcu, but before its first memory barrier, it would miss the
guarantee that the full memory barrier is issued before the beginning of that
SRCU read-side critical section. IOW, that memory barrier needs to be at the
very beginning of the grace period.

I'm confused, what's wrong with this ?

UPDATER READER
------- ------
STORE X = 1 STORE srcu_read_lock++
// rcu_seq_snap() smp_mb()
smp_mb() READ X
// scans
READ srcu_read_lock

What you refer to here is only memory ordering of the store to X and load from X wrt loading/increment of srcu_read_lock, which is internal to the srcu implementation. If we really want to model the provided high-level memory ordering guarantees, we should consider a scenario where SRCU is used for its memory ordering properties to synchronize other variables.

I'm concerned about the following Dekker scenario, where synchronize_srcu() and srcu_read_lock/unlock would be used instead of memory barriers:

Initial state: X = 0, Y = 0

Thread A Thread B
---------------------------------------------
STORE X = 1 STORE Y = 1
synchronize_srcu()
srcu_read_lock()
r1 = LOAD X
srcu_read_unlock()
r0 = LOAD Y

BUG_ON(!r0 && !r1)

So in the synchronize_srcu implementation, there appears to be two
major scenarios: either srcu_gp_start_if_needed starts a gp or expedited gp, or it uses an already started gp/expedited gp. When snapshotting with rcu_seq_snap, the fact that the memory barrier is after the ssp->srcu_gp_seq load means that it does not order prior memory accesses before that load. This sequence value is then used to identify which gp_seq to wait for when piggy-backing on another already-started gp. I worry about reordering between STORE X = 1 and load of ssp->srcu_gp_seq, which is then used to piggy-back on an already-started gp.

I suspect that the implicit barrier in srcu_read_lock() invoked at the
beginning of srcu_gp_start_if_needed() is really the barrier that makes
all this behave as expected. But without documentation it's rather hard to follow.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com