RE: [RFC] /dev/ioasid uAPI proposal

From: Parav Pandit
Date: Tue Jun 01 2021 - 13:30:59 EST


> From: Tian, Kevin <kevin.tian@xxxxxxxxx>
> Sent: Thursday, May 27, 2021 1:28 PM

> 5.6. I/O page fault
> +++++++++++++++
>
> (uAPI is TBD. Here is just about the high-level flow from host IOMMU driver
> to guest IOMMU driver and backwards).
>
> - Host IOMMU driver receives a page request with raw fault_data {rid,
> pasid, addr};
>
> - Host IOMMU driver identifies the faulting I/O page table according to
> information registered by IOASID fault handler;
>
> - IOASID fault handler is called with raw fault_data (rid, pasid, addr), which
> is saved in ioasid_data->fault_data (used for response);
>
> - IOASID fault handler generates an user fault_data (ioasid, addr), links it
> to the shared ring buffer and triggers eventfd to userspace;
>
> - Upon received event, Qemu needs to find the virtual routing information
> (v_rid + v_pasid) of the device attached to the faulting ioasid. If there are
> multiple, pick a random one. This should be fine since the purpose is to
> fix the I/O page table on the guest;
>
> - Qemu generates a virtual I/O page fault through vIOMMU into guest,
> carrying the virtual fault data (v_rid, v_pasid, addr);
>
Why does it have to be through vIOMMU?
For a VFIO PCI device, have you considered to reuse the same PRI interface to inject page fault in the guest?
This eliminates any new v_rid.
It will also route the page fault request and response through the right vfio device.

> - Guest IOMMU driver fixes up the fault, updates the I/O page table, and
> then sends a page response with virtual completion data (v_rid, v_pasid,
> response_code) to vIOMMU;
>
What about fixing up the fault for mmu page table as well in guest?
Or you meant both when above you said "updates the I/O page table"?

It is unclear to me that if there is single nested page table maintained or two (one for cr3 references and other for iommu).
Can you please clarify?

> - Qemu finds the pending fault event, converts virtual completion data
> into (ioasid, response_code), and then calls a /dev/ioasid ioctl to
> complete the pending fault;
>
For VFIO PCI device a virtual PRI request response interface is done, it can be generic interface among multiple vIOMMUs.

> - /dev/ioasid finds out the pending fault data {rid, pasid, addr} saved in
> ioasid_data->fault_data, and then calls iommu api to complete it with
> {rid, pasid, response_code};
>