Re: [RFC PATCH 0/4] mm: Add PG_zero support

From: Raj, Ashok
Date: Mon Apr 13 2020 - 11:25:39 EST


On Mon, Apr 13, 2020 at 08:14:32AM -0700, Dave Hansen wrote:
> On 4/13/20 7:49 AM, Alex Williamson wrote:
> >> VFIO's unconditional page pinning is the real problem here IMNHO. They
> >> don't *really* need to pin the memory. We just don't have good
> >> paravirtualized IOMMU support or want to pay the runtime cost for
> >> pin/unpin operations. You *could* totally have speedy VM startup if
> >> only the pages being accessed or having DMA performed to them were
> >> allocated. But, the hacks that are in place mean that everything must
> >> be pinned.
> > Maybe in an SEV or Secure Boot environment we can assume the VM guest
> > OS uses the IOMMU exclusively for DMA, but otherwise the IOMMU is
> > optional (at least for x86, other archs do require IOMMU support
> > afaik). Therefore, how would we know which pages to pin when there are
> > only limited configs where we might be able to lean on the vIOMMU to
> > this extent? Thanks,
>
> You can delay pinning until the device is actually used. That should be
> late enough for the host to figure out whether a paravirtualized IOMMU
> is in place.

When you have a device assigned to a guest, it is used when the guest starts
probing the device. Some devices like VF's need DMA even to probe and get
resources assigned from the PF.

The only way we can do this is when device support ATS and PRS. And host
iommu driver to know if this fault needs to be handled by the host (if the
2nd level is at fault), or the guest if the walk in first level isn't
resolved.

2nd level faults need to be resolved by the VMM.