Re: [RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space

From: Baolu Lu
Date: Tue Jun 27 2023 - 22:01:17 EST


On 2023/6/27 2:33, Jason Gunthorpe wrote:
On Sun, Jun 25, 2023 at 02:30:46PM +0800, Baolu Lu wrote:

Agreed. We should avoid workqueue in sva iopf framework. Perhaps we
could go ahead with below code? It will be registered to device with
iommu_register_device_fault_handler() in IOMMU_DEV_FEAT_IOPF enabling
path. Un-registering in the disable path of cause.

This maze needs to be undone as well.

It makes no sense that all the drivers are calling

iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);

The driver should RX a PRI fault and deliver it to some core code
function, this looks like a good start:

static int io_pgfault_handler(struct iommu_fault *fault, void *cookie)
{
ioasid_t pasid = fault->prm.pasid;
struct device *dev = cookie;
struct iommu_domain *domain;

if (fault->type != IOMMU_FAULT_PAGE_REQ)
return -EOPNOTSUPP;

if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID)
domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
else
domain = iommu_get_domain_for_dev(dev);

if (!domain || !domain->iopf_handler)
return -ENODEV;

if (domain->type == IOMMU_DOMAIN_SVA)
return iommu_queue_iopf(fault, cookie);

return domain->iopf_handler(fault, dev, domain->fault_data);

Then we find the domain that owns the translation and invoke its
domain->ops->iopf_handler()

Agreed. The iommu_register_device_fault_handler() could only be called
by the device drivers who want to handle the DMA faults and IO page
faults by themselves in any special ways.

By default, the faults should be dispatched to domain->iopf_handler in a
generic core code.


If the driver created a SVA domain then the op should point to some
generic 'handle sva fault' function. There shouldn't be weird SVA
stuff in the core code.

The weird SVA stuff is really just a generic per-device workqueue
dispatcher, so if we think that is valuable then it should be
integrated into the iommu_domain (domain->ops->use_iopf_workqueue =
true for instance). Then it could route the fault through the
workqueue and still invoke domain->ops->iopf_handler.

The word "SVA" should not appear in any of this.

Yes. We should make it generic. The domain->use_iopf_workqueue flag
denotes that the page faults of a fault group should be put together and
then be handled and responded in a workqueue. Otherwise, the page fault
is dispatched to domain->iopf_handler directly.


Not sure what iommu_register_device_fault_handler() has to do with all
of this.. Setting up the dev_iommu stuff to allow for the workqueue
should happen dynamically during domain attach, ideally in the core
code before calling to the driver.

There are two pointers under struct dev_iommu for fault handling.

/**
* struct dev_iommu - Collection of per-device IOMMU data
*
* @fault_param: IOMMU detected device fault reporting data
* @iopf_param: I/O Page Fault queue and data

[...]

struct dev_iommu {
struct mutex lock;
struct iommu_fault_param *fault_param;
struct iopf_device_param *iopf_param;

My understanding is that @fault_param is a place holder for generic
things, while @iopf_param is workqueue specific. Perhaps we could make
@fault_param static and initialize it during iommu device_probe, as
IOMMU fault is generic on every device managed by an IOMMU.

@iopf_param could be allocated on demand. (perhaps renaming it to a more
meaningful one?) It happens before a domain with use_iopf_workqueue flag
set attaches to a device. iopf_param keeps alive until device_release.


Also, I can understand there is a need to turn on PRI support really
early, and it can make sense to have some IOMMU_DEV_FEAT_IOPF/SVA to
ask to turn it on.. But that should really only be needed if the HW
cannot turn it on dynamically during domain attach of a PRI enabled
domain.

It needs cleaning up..

Yes. I can put this and other cleanup things that we've discussed in a
preparation series and send it out for review after the next rc1 is
released.


Jason

Best regards,
baolu