RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

From: Tian, Kevin
Date: Fri Apr 02 2021 - 03:30:36 EST


> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Friday, April 2, 2021 12:04 AM
>
> On Thu, Apr 01, 2021 at 02:08:17PM +0000, Liu, Yi L wrote:
>
> > DMA page faults are delivered to root-complex via page request message
> and
> > it is per-device according to PCIe spec. Page request handling flow is:
> >
> > 1) iommu driver receives a page request from device
> > 2) iommu driver parses the page request message. Get the RID,PASID,
> faulted
> > page and requested permissions etc.
> > 3) iommu driver triggers fault handler registered by device driver with
> > iommu_report_device_fault()
>
> This seems confused.
>
> The PASID should define how to handle the page fault, not the driver.
>
> I don't remember any device specific actions in ATS, so what is the
> driver supposed to do?
>
> > 4) device driver's fault handler signals an event FD to notify userspace to
> > fetch the information about the page fault. If it's VM case, inject the
> > page fault to VM and let guest to solve it.
>
> If the PASID is set to 'report page fault to userspace' then some
> event should come out of /dev/ioasid, or be reported to a linked
> eventfd, or whatever.
>
> If the PASID is set to 'SVM' then the fault should be passed to
> handle_mm_fault
>
> And so on.
>
> Userspace chooses what happens based on how they configure the PASID
> through /dev/ioasid.
>
> Why would a device driver get involved here?
>
> > Eric has sent below series for the page fault reporting for VM with passthru
> > device.
> > https://lore.kernel.org/kvm/20210223210625.604517-5-
> eric.auger@xxxxxxxxxx/
>
> It certainly should not be in vfio pci. Everything using a PASID needs
> this infrastructure, VDPA, mdev, PCI, CXL, etc.
>

This touches an interesting fact:

The fault may be triggered in either 1st-level or 2nd-level page table,
when nested translation is enabled (in vSVA case). The 1st-level is bound
by the user space, which therefore needs to receive the fault event. The
2nd-level is managed by VFIO (or vDPA), which needs to fix the fault in
kernel (e.g. find HVA per faulting GPA, call handle_mm_fault and map
GPA->HPA to IOMMU). Yi's current proposal lets VFIO to register the
device fault handler, which then forward the event through /dev/ioasid
to userspace only if it is a 1st-level fault. Are you suggesting a pgtable-
centric fault reporting mechanism to separate handlers in each level,
i.e. letting VFIO register handler only for 2nd-level fault and then /dev/
ioasid register handler for 1st-level fault?

Thanks
Kevin