Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

From: Jason Gunthorpe
Date: Tue Apr 18 2017 - 14:01:05 EST


On Tue, Apr 18, 2017 at 10:27:47AM -0700, Dan Williams wrote:
> > FWIW, RDMA probably wouldn't want to use a p2mem device either, we
> > already have APIs that map BAR memory to user space, and would like to
> > keep using them. A 'enable P2P for bar' helper function sounds better
> > to me.
>
> ...and I think it's not a helper function as much as asking the bus
> provider "can these two device dma to each other".

What I mean I could write in a RDMA driver:

/* Allow the memory in BAR 1 to be the target of P2P transactions */
pci_enable_p2p_bar(dev, 1);

And not require anything else..

> The "helper" is the dma api redirecting through a software-iommu
> that handles bus address translation differently than it would
> handle host memory dma mapping.

Not sure, until we see what arches actually need to do here it is hard
to design common helpers.

Here are a few obvious things that arches will need to implement to
support this broadly:

- Virtualization might need to do a hypervisor call to get the right
translation, or consult some hypervisor specific description table.

- Anything using IOMMUs for virtualization will need to setup IOMMU
permissions to allow the P2P flow, this might require translation to
an address cookie.

- Fail if the PCI devices are in different domains, or setup hardware to
do completion bus/device/function translation.

- All platforms can succeed if the PCI devices are under the same
'segment', but where segments begin is somewhat platform specific
knowledge. (this is 'same switch' idea Logan has talked about)

So, we can eventually design helpers for various common scenarios, but
until we see what arch code actually needs to do it seems
premature. Much of this seems to involve interaction with some kind of
hardware, or consulation of some kind of currently platform specific
data, so I'm not sure what a software-iommu would be doing??

The main thing to agree on is that this code belongs under dma ops and
that arches have to support struct page mapped BAR addresses in their
dma ops inputs. Is that resonable?

Jason