Re: Enabling peer to peer device transactions for PCIe devices

From: Serguei Sagalovitch
Date: Wed Nov 30 2016 - 12:29:40 EST


On 2016-11-30 11:23 AM, Jason Gunthorpe wrote:
Yes, that sounds fine. Can we simply kill the process from the GPU driver?
Or do we need to extend the OOM killer to manage GPU pages?
I don't know..
We could use send_sig_info to send signal from kernel to user space. So theoretically GPU driver
could issue KILL signal to some process.

On Wed, Nov 30, 2016 at 12:45:58PM +0200, Haggai Eran wrote:
I think we can achieve the kernel's needs with ZONE_DEVICE and DMA-API support
for peer to peer. I'm not sure we need vmap. We need a way to have a scatterlist
of MMIO pfns, and ZONE_DEVICE allows that.
I do not think that using DMA-API as it is is the best solution (at least in the current form):

- It deals with handles/fd for the whole allocation but client could/will use sub-allocation as
well as theoretically possible to "merge" several allocations in one from GPU perspective.
- It require knowledge to export but because "sharing" is controlled from user space it
means that we must "export" all allocation by default
- It deals with 'fd'/handles but user application may work with addresses/pointers.

Also current DMA-API force each time to do all DMA table programming unrelated if
location was changed or not. With vma / mmu we are able to install notifier to intercept
changes in location and update translation tables only as needed (we do not need to keep
get_user_pages() lock).