Re: [PATCH 2/3] dma: override "dma_flags_set_dmaflush" for sn-ia64

From: James Bottomley
Date: Wed Aug 22 2007 - 12:45:43 EST


On Wed, 2007-08-22 at 09:03 -0700, Jesse Barnes wrote:
> On Wednesday, August 22, 2007 7:02:38 am James Bottomley wrote:
> > On Wed, 2007-08-22 at 09:39 +0200, Jes Sorensen wrote:
> > > James Bottomley wrote:
> > > > On Tue, 2007-08-21 at 17:34 -0700, akepner@xxxxxxx wrote:
> > > >> The term "posted DMA" is used to describe this behavior in the Altix
> > > >> Device Driver Writer's Guide, but it may be confusing things here.
> > > >> Maybe a better term will suggest itself if I can clarify....
> > > >
> > > > OK, but posted DMA has a pretty specific meaning in terms of PCI, hence
> > > > the confusion.
> > >
> > > Maybe it would be more better to refer to this as 'out of order DMA'?
> >
> > Or Relaxed ordering DMA ... that's why the readX_relaxed()?
> >
> > > >> On Altix, DMA from a device isn't guaranteed to arrive in host memory
> > > >> in the order it was sent from the device. This reordering can happen
> > > >> in the NUMA interconnect (it's specifically not a PCI reordering.)
> > > >
> > > > This is mmiowb and read_relaxed() again, isn't it?
> > >
> > > I believe it's the same problem, except this time it's when exposing
> > > structures to userland.
> >
> > Hmm, so how does another kernel API exposing mmiowb in a different way
> > help with this? Surely, if the driver is exporting something to user
> > space, it's simply up to the driver to call mmiowb when the user says
> > it's done?
>
> mmiowb() is for PIO->device. This interface is for DMA->memory (see akepner's
> other mail).
>
> The problem is a DMA write (say to a completion queue) from a device may imply
> something about another DMA write from the same device (say the actual data).
> If the completion queue write arrives first (which can happen on sn2), the
> driver must ensure that the rest of the outstanding DMA is complete prior to
> looking at the completion queue status. It can either use a regular PIO read
> to do this (i.e. a non-relaxed one) or set a flag on the completion queue DMA
> address that makes it act as a barrier wrt other DMA, which is what akepner's
> patch does (which should be much more efficient that using a PIO read to
> guarantee DMA writes have completed).

This is a violation of the PCI spec, isn't it, like Matthew pointed out?
The only time a device->host DMA transaction shouldn't follow strict
ordering is when the device sets the relaxed hint in its PCI registers.

James




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/