Re: [regression?] Re: [PATCH v6 06/12] mm/gup: track FOLL_PIN pages

From: Alex Williamson
Date: Tue Apr 28 2020 - 16:12:40 EST


On Tue, 28 Apr 2020 16:22:51 -0300
Jason Gunthorpe <jgg@xxxxxxxx> wrote:

> On Tue, Apr 28, 2020 at 01:07:52PM -0600, Alex Williamson wrote:
> > On Tue, 28 Apr 2020 14:49:57 -0300
> > Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> >
> > > On Tue, Apr 28, 2020 at 10:54:55AM -0600, Alex Williamson wrote:
> > > > static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
> > > > {
> > > > struct vfio_pci_device *vdev = device_data;
> > > > @@ -1253,8 +1323,14 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
> > > > vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> > > > vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
> > > >
> > > > + vma->vm_ops = &vfio_pci_mmap_ops;
> > > > +
> > > > +#if 1
> > > > + return 0;
> > > > +#else
> > > > return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
> > > > - req_len, vma->vm_page_prot);
> > > > + vma->vm_end - vma->vm_start, vma->vm_page_prot);
> > >
> > > The remap_pfn_range here is what tells get_user_pages this is a
> > > non-struct page mapping:
> > >
> > > vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
> > >
> > > Which has to be set when the VMA is created, they shouldn't be
> > > modified during fault.
> >
> > Aha, thanks Jason! So fundamentally, pin_user_pages_remote() should
> > never have been faulting in this vma since the pages are non-struct
> > page backed.
>
> gup should not try to pin them.. I think the VM will still call fault
> though, not sure from memory?

Hmm, at commit 3faa52c03f44 the behavior is that I don't see a fault on
pin, maybe that's a bug. But trying to rebase to current top of tree,
now my DMA mapping gets an -EFAULT, so something is still funky :-\

> > Maybe I was just getting lucky before this commit. For a
> > VM_PFNMAP, vaddr_get_pfn() only needs pin_user_pages_remote() to return
> > error and the vma information that we setup in vfio_pci_mmap().
>
> I've written on this before, vfio should not be passing pages to the
> iommu that it cannot pin eg it should not touch VM_PFNMAP vma's in the
> first place.
>
> It is a use-after-free security issue the way it is..

Where is the user after free? Here I'm trying to map device mmio space
through the iommu, which we need to enable p2p when the user owns
multiple devices. The device is owned by the user, bound to vfio-pci,
and can't be unbound while the user has it open. The iommu mappings
are torn down on release. I guess I don't understand the problem.

> > only need the fault handler to trigger for user access, which is what I
> > see with this change. That should work for me.
> >
> > > Also the vma code above looked a little strange to me, if you do send
> > > something like this cc me and I can look at it. I did some work like
> > > this for rdma a while ago..
> >
> > Cool, I'll do that. I'd like to be able to zap the vmas from user
> > access at a later point and I have doubts that I'm holding the
> > refs/locks that I need to for that. Thanks,
>
> Check rdma_umap_ops, it does what you described (actually it replaces
> them with 0 page, but along the way it zaps too).

Ok, thanks,

Alex