Re: [PATCH 1/4] spinlock: Document memory barrier rules

From: Peter Zijlstra
Date: Thu Sep 01 2016 - 04:46:04 EST


On Wed, Aug 31, 2016 at 08:32:18PM +0200, Manfred Spraul wrote:
> On 08/31/2016 06:40 PM, Will Deacon wrote:

> >The litmus test then looks a bit like:
> >
> >CPUm:
> >
> >LOCK(x)
> >smp_mb();
> >RyAcq=0
> >
> >
> >CPUn:
> >
> >Wy=1
> >smp_mb();
> >UNLOCK_WAIT(x)
> Correct.
> >
> >which I think can be simplified to:
> >
> >
> >LOCK(x)
> I thought that here a barrier is required, because Ry=0 could be before
> store of the lock.
> >Ry=0
> RyAcq instead of Ry would required due to the unlock at the end of the
> critical section
> CpuN: <...>
> WyRelease=0
> for the litmus test irrelevant.
> >Wy=1
> >smp_mb(); // Note that this is implied by spin_unlock_wait on PPC and arm64
> >LOCK(x) // spin_unlock_wait behaves like lock; unlock
> >UNLOCK(x)
>
> >[I've removed a bunch of barriers here, that I don't think are necessary
> > for the guarantees you're after]
> >
> >and the question is "Can both CPUs proceed?".
> >
> >Looking at the above, then I don't think that they can. Whilst CPUm can
> >indeed speculate the Ry=0 before successfully taking the lock, if CPUn
> >observes CPUm's read, then it must also observe the lock being held wrt
> >the spin_lock API. That is because a successful LOCK operation by CPUn
> >would force CPUm to replay its LL/SC loop and therefore discard its
> >speculation of y.
> >
> >What am I missing? The code snippet seems to have too many barriers to me!

> spin_unlock_wait() is not necessarily lock()+unlock().
> It can be a simple Rx, or now RxAcq.

Can be, normally, yes. But on power and arm64, the only architectures on
which the ACQUIRE is 'funny' they do the 'pointless' ll/sc cycle in
spin_unlock_wait() to 'fix' things.

So for both power and arm64, you can in fact model spin_unlock_wait()
as LOCK+UNLOCK.

All the other archs have (so far) 'sensible' ACQUIRE semantics and all
this is moot.

[ MIPS _could_ possibly do the 'interesting' ACQUIRE too, but so far
hasn't introduced their fancy barriers. ]

The other interesting case is qspinlock, which does the unordered store
in software, but that is after a necessary atomic (ACQUIRE) operation to
enqueue, which we exploit in the queued_spin_unlock_wait() to order
against. So that too is good, assuming the ACQUIRE is good.

Once Power/ARM64 use qspinlock, they'll need to be careful again.