Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()

From: Christian König
Date: Mon Oct 04 2021 - 02:58:55 EST


I'm not following this discussion to closely, but try to look into it from time to time.

Am 01.10.21 um 19:45 schrieb Jason Gunthorpe:
On Fri, Oct 01, 2021 at 11:01:49AM -0600, Logan Gunthorpe wrote:

In device-dax, the refcount is only used to prevent the device, and
therefore the pages, from going away on device unbind. Pages cannot be
recycled, as you say, as they are mapped linearly within the device. The
address space invalidation is done only when the device is unbound.
By address space invalidation I mean invalidation of the VMA that is
pointing to those pages.

device-dax may not have a issue with use-after-VMA-invalidation by
it's very nature since every PFN always points to the same
thing. fsdax and this p2p stuff are different though.

Before the invalidation, an active flag is cleared to ensure no new
mappings can be created while the unmap is proceeding.
unmap_mapping_range() should sequence itself with the TLB flush and
AFIAK unmap_mapping_range() kicks off the TLB flush and then
returns. It doesn't always wait for the flush to fully finish. Ie some
cases use RCU to lock the page table against GUP fast and so the
put_page() doesn't happen until the call_rcu completes - after a grace
period. The unmap_mapping_range() does not wait for grace periods.

Wow, wait a second. That is quite a boomer. At least in all GEM/TTM based graphics drivers that could potentially cause a lot of trouble.

I've just double checked and we certainly have the assumption that when unmap_mapping_range() returns the pte is gone and the TLB flush completed in quite a number of places.

Do you have more information when and why that can happen?

Thanks,
Christian.