Re: Memory barrier needed with wake_up_process()?

From: Peter Zijlstra
Date: Mon Sep 05 2016 - 04:33:45 EST


On Sat, Sep 03, 2016 at 10:49:39AM -0400, Alan Stern wrote:
> On Sat, 3 Sep 2016, Alan Stern wrote:
>
> > In other words, we have:
> >
> > CPU 0 CPU 1
> > ----- -----
> > Start DMA Handle DMA-complete irq
> > Sleep until bh->state Set bh->state
> > smp_wmb()
> > Wake up CPU 0
> > smp_rmb()
> > Compute rc based on contents
> > of the DMA buffer
> >
> > This was written many years ago, at a time when I did not fully
> > understand all the details of memory ordering. Do you agree that both
> > of those barriers should really be smp_mb()? That's what Felipe has
> > been testing.
>
> Actually, seeing it written out like this, one realizes that it really
> ought to be:
>
> CPU 0 CPU 1
> ----- -----
> Start DMA Handle DMA-complete irq
> Sleep until bh->state smp_mb()
> set bh->state
> Wake up CPU 0
> smp_mb()
> Compute rc based on contents of the DMA buffer
>
> (Bear in mind also that on some platforms, the I/O operation is carried
> out by PIO rather than DMA.)

I'm sorry, but I still don't follow. This could be because I seldom
interact with DMA agents and therefore am not familiar with that stuff.

Also, smp_mb() doesn't necessarily interact with MMIO/DMA at all IIRC.
Its only defined to do CPU/CPU interactions.

I would very much expect the device IO stuff to order things for us in
this case. "Start DMA" should very much include sufficient fences to
ensure the data under DMA is visible to the DMA engine, this would very
much include things like flushing store buffers and maybe even writeback
caches, depending on platform needs.

At the same time, I would expect "Handle DMA-complete irq", even if it
were done with a PIO polling loop, to guarantee ordering against later
operations such that 'complete' really means that.