Re: Memory barrier needed with wake_up_process()?

From: Alan Stern
Date: Mon Sep 05 2016 - 11:29:32 EST


On Mon, 5 Sep 2016, Peter Zijlstra wrote:

> > Actually, seeing it written out like this, one realizes that it really
> > ought to be:
> >
> > CPU 0 CPU 1
> > ----- -----
> > Start DMA Handle DMA-complete irq
> > Sleep until bh->state smp_mb()
> > set bh->state
> > Wake up CPU 0
> > smp_mb()
> > Compute rc based on contents of the DMA buffer
> >
> > (Bear in mind also that on some platforms, the I/O operation is carried
> > out by PIO rather than DMA.)
>
> I'm sorry, but I still don't follow. This could be because I seldom
> interact with DMA agents and therefore am not familiar with that stuff.

I haven't seen the details of memory ordering requirements in the
presence of DMA spelled out anywhere. The documents I have read merely
state that you have to careful to flush caches before doing DMA OUT and
to avoid filling caches before DMA IN is complete. Neither of those is
an issue here, apparently.

> Also, smp_mb() doesn't necessarily interact with MMIO/DMA at all IIRC.
> Its only defined to do CPU/CPU interactions.

Suppose the DMA master finishes filling up an input buffer and issues a
completion irq to CPU1. Presumably the data would then be visible to
CPU1 if the interrupt handler looked at it. So if CPU1 executes smp_mb
before setting bh->state and waking up CPU0, and if CPU0 executes
smp_mb after testing bh->state and before reading the data buffer,
wouldn't CPU0 then see the correct data in the buffer? Even if CPU0
never did go to sleep?

Or would something more be needed?

> I would very much expect the device IO stuff to order things for us in
> this case. "Start DMA" should very much include sufficient fences to
> ensure the data under DMA is visible to the DMA engine, this would very
> much include things like flushing store buffers and maybe even writeback
> caches, depending on platform needs.
>
> At the same time, I would expect "Handle DMA-complete irq", even if it
> were done with a PIO polling loop, to guarantee ordering against later
> operations such that 'complete' really means that.

That's what I would expect too.

Back in the original email thread where the problem was first reported,
Felipe says that the problem appears to be something else. Here's what
it looks like now, in schematic form:

CPU0
----
get_next_command():
while (bh->state != BUF_STATE_EMPTY)
sleep_thread();
start an input request for bh
while (bh->state != BUF_STATE_FULL)
sleep_thread();

As mentioned above, the input involves DMA and is terminated by an irq.
The request's completion handler is bulk_out_complete():

CPU1
----
bulk_out_complete():
bh->state = BUF_STATE_FULL;
wakeup_thread();

According to Felipe, when CPU0 wakes up and checks bh->state, it sees a
value different from BUF_STATE_FULL. So it goes back to sleep again
and doesn't make any forward progress.

It's possible that something else is changing bh->state when it
shouldn't. But if this were the explanation, why would Felipe see that
the problem goes away when he changes the memory barriers in
sleep_thread() and wakeup_thread() to smp_mb()?

Alan Stern