Re: [regression?] Re: [PATCH v6 06/12] mm/gup: track FOLL_PIN pages

From: Jason Gunthorpe
Date: Wed Apr 29 2020 - 19:03:06 EST


On Wed, Apr 29, 2020 at 01:56:33PM -0600, Alex Williamson wrote:
> On Tue, 28 Apr 2020 21:29:03 -0300
> Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> > On Tue, Apr 28, 2020 at 02:12:23PM -0600, Alex Williamson wrote:
> >
> > > > > Maybe I was just getting lucky before this commit. For a
> > > > > VM_PFNMAP, vaddr_get_pfn() only needs pin_user_pages_remote() to return
> > > > > error and the vma information that we setup in vfio_pci_mmap().
> > > >
> > > > I've written on this before, vfio should not be passing pages to the
> > > > iommu that it cannot pin eg it should not touch VM_PFNMAP vma's in the
> > > > first place.
> > > >
> > > > It is a use-after-free security issue the way it is..
> > >
> > > Where is the user after free? Here I'm trying to map device mmio space
> > > through the iommu, which we need to enable p2p when the user owns
> > > multiple devices.
> >
> > Yes, I gathered what the intent was..
> >
> > > The device is owned by the user, bound to vfio-pci, and can't be
> > > unbound while the user has it open. The iommu mappings are torn
> > > down on release. I guess I don't understand the problem.
> >
> > For PFNMAP VMAs the lifecycle rule is basically that the PFN inside
> > the VMA can only be used inside the mmap_sem that read it. Ie you
> > cannot take a PFN outside the mmap_sem and continue to use it.
> >
> > This is because the owner of the VMA owns the lifetime of that PFN,
> > and under the write side of the mmap_sem it can zap the PFN, or close
> > the VMA. Afterwards the VMA owner knows that there are no active
> > reference to the PFN in the system and can reclaim the PFN
> >
> > ie the PFNMAP has no per-page pin counter. All lifetime revolves around
> > the mmap_sem and the vma.
> >
> > What vfio does is take the PFN out of the mmap_sem and program it into
> > the iommu.
> >
> > So when the VMA owner decides the PFN has no references, it actually
> > doesn't: vfio continues to access it beyond its permitted lifetime.
> >
> > HW like mlx5 and GPUs have BAR pages which have security
> > properties. Once the PFN is returned to the driver the security
> > context of the PFN can be reset and re-assigned to another
> > process. Using VFIO a hostile user space can retain access to the BAR
> > page and upon its reassignment access a security context they were not
> > permitted to access.
> >
> > This is why GUP does not return PFNMAP pages and vfio should not carry
> > a reference outside the mmap_sem. It breaks all the lifetime rules.
>
> Thanks for the explanation. I'm inferring that there is no solution to
> this,

Not a particularly good one unfortunately. I've been wanting to use
P2P_DMA pages to solve these kinds of things but they are kind of
expensive.

I have a copy of some draft patches trying to do this

> but why can't we use mmu notifiers to invalidate the iommu on zap or
> close?

Hum.. I think with the new mmu interval notifiers vfio might be able
to manage that without a huge amount of trouble. But the iommu
invalidation needs to be synchronous from a mmu notifier callback - is
that feasible?

But even so, we have all this stuff now for authorizing PCI P2P which
this design completely ignores as well. :(

> I know that at least QEMU won't consider these sorts of mapping
> fatal, so we could possibly change the default and make support for
> such mappings opt-in, but I don't know if I'd break DPDK, or
> potentially users within QEMU that make use of p2p between devices.

I'd heard this was mostly for GPU device assignment? I'd be surprised
if DPDK used this..

Jason