Re: [PATCH] vfio/pci: take mmap write lock for io_remap_pfn_range

From: Yan Zhao
Date: Fri May 12 2023 - 04:28:05 EST


On Thu, May 11, 2023 at 10:07:06AM -0600, Alex Williamson wrote:
> On Wed, 10 May 2023 17:41:06 -0300
> Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>
> > On Mon, May 08, 2023 at 02:57:15PM -0600, Alex Williamson wrote:
> >
> > > We already try to set the flags in advance, but there are some
> > > architectural flags like VM_PAT that make that tricky. Cedric has been
> > > looking at inserting individual pages with vmf_insert_pfn(), but that
> > > incurs a lot more faults and therefore latency vs remapping the entire
> > > vma on fault. I'm not convinced that we shouldn't just attempt to
> > > remove the fault handler entirely, but I haven't tried it yet to know
> > > what gotchas are down that path. Thanks,
> >
> > I thought we did it like this because there were races otherwise with
> > PTE insertion and zapping? I don't remember well anymore.
>
> TBH, I don't recall if we tried a synchronous approach previously. The
> benefit of the faulting approach was that we could track the minimum
> set of vmas which are actually making use of the mapping and throw that
> tracking list away when zapping. Without that, we need to add vmas
> both on mmap and in vm_ops.open, removing only in vm_ops.close, and
> acquire all the proper mm locking for each vma to re-insert the
> mappings.
>
> > I vaugely remember the address_space conversion might help remove the
> > fault handler?
>
> Yes, this did remove the fault handler entirely, it's (obviously)
> dropped off my radar, but perhaps in the interim we could switch to
> vmf_insert_pfn() and revive the address space series to eventually
> remove the fault handling and vma list altogether.
>
> For reference, I think this was the last posting of the address space
> series:
>
> https://lore.kernel.org/all/162818167535.1511194.6614962507750594786.stgit@omen/

Just took a quick look at this series.
A question is that looks it still needs to call io_remap_pfn_range() in
places like vfio_basic_config_write() for PCI_COMMAND, and device reset,
so mmap write lock is still required around vdev->memory_lock.