Re: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX)

From: Paul E. McKenney
Date: Tue Dec 15 2015 - 01:15:50 EST


On Mon, Dec 14, 2015 at 06:49:31PM +0000, One Thousand Gnomes wrote:
> On Fri, 11 Dec 2015 14:35:40 -0800
> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>
> > On Fri, Dec 11, 2015 at 02:48:03PM +0100, Peter Zijlstra wrote:
> > > On Fri, Dec 11, 2015 at 01:33:14PM +0000, Will Deacon wrote:
> > > > On Fri, Dec 11, 2015 at 01:26:47PM +0100, Peter Zijlstra wrote:
> > >
> > > > > While we're there, the acquire in osq_wait_next() seems somewhat ill
> > > > > documented too.
> > > > >
> > > > > I _think_ we need ACQUIRE semantics there because we want to strictly
> > > > > order the lock-unqueue A,B,C steps and we get that with:
> > > > >
> > > > > A: SC
> > > > > B: ACQ
> > > > > C: Relaxed
> > > > >
> > > > > Similarly for unlock we want the WRITE_ONCE to happen after
> > > > > osq_wait_next, but in that case we can even rely on the control
> > > > > dependency there.
> > > >
> > > > Even for the lock-unqueue case, isn't B->C ordered by a control dependency
> > > > because C consists only of stores?
> > >
> > > Hmm, indeed. So we could go fully relaxed on it I suppose, since the
> > > same is true for the unlock site.
> >
> > I am probably missing quite a bit on this thread, but don't x86 MMIO
> > accesses to frame buffers need to interact with something more heavyweight
> > than an x86 release store or acquire load in order to remain confined
> > to the resulting critical section?
>
> Depends upon the device and the mapping. There are also CPU errata
> related to write combining on older CPUs (notably Pentium Pro era) which
> result in ordering errors with write combining unless deliberately fenced.
>
> Any PCI access isn't constrained to the critical section unless a PCI
> read from the same device is done and completes before exiting. Even then
> on processors with a separate APIC bus (PPro, PII I think) interrupts are
> asynchronous on their own bus.
>
> The PCI posting rules also apply to DMA.
>
> Finally we run the IDT WinChip in out-of-order store mode not full x86
> compatibility which while uniprocessor does mean the correct fences
> matter.
>
> Just to ensure total confusion some video cards have MMIO areas that are
> not in fact memory but a FIFO rigged to look like a block of RAM for
> speed of writing. In those cases the rules are a bit card dependant.

Sounds like the usual fun and excitement! ;-)

> But seriously are there any cases we actually care about this for osq ?

Apparently not, given Peter's email.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/