Re: Documentation/memory-barriers.txt: Is "stores are not speculated" correct?

From: Randy Dunlap
Date: Mon Apr 26 2021 - 11:13:33 EST


On 4/26/21 2:30 AM, Luc Maranget wrote:
>> On Mon, Apr 26, 2021 at 10:23:09AM +0800, szyhb810501.student@xxxxxxxx wrote:
>>>
>>> Hello everyone, I have a question."Documentation/memory-barriers.txt"
>>> says:However, stores are not speculated. This means that ordering -is-
>>> providedfor load-store control dependencies, as in the following example:
>> q = READ_ONCE(a);
>> if (q) {
>> WRITE_ONCE(b, 1);
>> }
>>> Is "stores are not speculated" correct? I
>>> think store instructions can be executed speculatively.
>>> "https://stackoverflow.com/questions/64141366/can-a-speculatively-executed-cpu-branch-contain-opcodes-that-access-ram";
>>> says:Store instructions can also be executed speculatively thanks to the
>>> store buffer. The actual execution of a store just writes the address and
>>> data into the store buffer.Commit to L1d cache happens some time after
>>> the store instruction retires from the ROB, i.e. when the store is known
>>> to be non-speculative, the associated store-buffer entry "graduates"
>>> and becomes eligible to commit to cache and become globally visible.
>>
>> >From the viewpoint of other CPUs, the store hasn't really happened
>> until it finds its way into a cacheline. As you yourself note above,
>> if the store is still in the store buffer, it might be squashed when
>> speculation fails.
>>
>> So Documentation/memory-barriers.txt and that stackoverflow entry are
>> not really in conflict, but are instead using words a bit differently
>> from each other. The stackoverflow entry is considering a store to have
>> in some sense happened during a time when it might later be squashed.
>> In contrast, the Documentation/memory-barriers.txt document only considers
>> a store to have completed once it is visible outside of the CPU executing
>> that store.
>>
>> So from a stackoverflow viewpoint, stores can be speculated, but until
>> they are finalized, they must be hidden from other CPUs.
>>
>> >From a Documentation/memory-barriers.txt viewpoint, stores don't complete
>> until they update their cachelines, and stores may not be speculated.
>> Some of the actions that lead up to the completion of a store may be
>> speculated, but not the completion of the store itself.
>>
>> Different words, but same effect. Welcome to our world! ;-)
>>
>> Thanx, Paul
>
> Hi all,
>
> Here is a complement to Paul's excellent answer.
>
> The "CPU-local" speculation of stores can be observed
> by the following test (in C11)
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> C PPOCA
>
> {}
>
> P0(volatile int* y, volatile int* x) {
>
> atomic_store(x,1);
> atomic_store(y,1);
>
> }
>
> P1(volatile int* z, volatile int* y, volatile int* x) {
>
> int r1=-1; int r2=-1;
> int r0 = atomic_load_explicit(y,memory_order_relaxed);
> if (r0) {
> atomic_store_explicit(z,1,memory_order_relaxed);
> r1 = atomic_load_explicit(z,memory_order_relaxed);
> r2 = atomic_load_explicit(x+(r1 & 128),memory_order_relaxed);
> }
>
> }
>
>
> This is a variation on the MP test.
>
> Because of tht conditionnal "if (..) { S }" Statements "S" can be executed
> speculatively.
>
> More precisely, the store statement writes value 1 into the CPU local
> structure for variable z. The next load statement reads the value,
> and the last load statement can be peformed (speculatively)
> as its address is known.
>
> The resulting outcomme is observed for instance on a RaspBerry Pi3,
> see attached file.

?attached file?

--
~Randy