Re: [RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space

From: Baolu Lu
Date: Wed Jun 28 2023 - 21:07:50 EST


On 2023/6/28 20:49, Jason Gunthorpe wrote:
On Wed, Jun 28, 2023 at 10:00:56AM +0800, Baolu Lu wrote:
If the driver created a SVA domain then the op should point to some
generic 'handle sva fault' function. There shouldn't be weird SVA
stuff in the core code.

The weird SVA stuff is really just a generic per-device workqueue
dispatcher, so if we think that is valuable then it should be
integrated into the iommu_domain (domain->ops->use_iopf_workqueue =
true for instance). Then it could route the fault through the
workqueue and still invoke domain->ops->iopf_handler.

The word "SVA" should not appear in any of this.

Yes. We should make it generic. The domain->use_iopf_workqueue flag
denotes that the page faults of a fault group should be put together and
then be handled and responded in a workqueue. Otherwise, the page fault
is dispatched to domain->iopf_handler directly.

It might be better to have iopf_handler and
iopf_handler_work function pointers to distinguish to two cases.

Both are okay. Let's choose one when we have the code.


Not sure what iommu_register_device_fault_handler() has to do with all
of this.. Setting up the dev_iommu stuff to allow for the workqueue
should happen dynamically during domain attach, ideally in the core
code before calling to the driver.

There are two pointers under struct dev_iommu for fault handling.

/**
* struct dev_iommu - Collection of per-device IOMMU data
*
* @fault_param: IOMMU detected device fault reporting data
* @iopf_param: I/O Page Fault queue and data

[...]

struct dev_iommu {
struct mutex lock;
struct iommu_fault_param *fault_param;
struct iopf_device_param *iopf_param;

My understanding is that @fault_param is a place holder for generic
things, while @iopf_param is workqueue specific.

Well, lets look

struct iommu_fault_param {
iommu_dev_fault_handler_t handler;
void *data;

These two make no sense now. handler is always iommu_queue_iopf. Given
our domain centric design we want the function pointer in the domain,
not in the device. So delete it.

Agreed.


struct list_head faults;
struct mutex lock;

Queue of unhandled/unacked faults? Seems sort of reasonable

It's the list of faults pending for response.

@iopf_param could be allocated on demand. (perhaps renaming it to a more
meaningful one?) It happens before a domain with use_iopf_workqueue flag
set attaches to a device. iopf_param keeps alive until device_release.

Yes

Do this for the iommu_fault_param as well, in fact, probably just put
the two things together in one allocation and allocate if we attach a
PRI using domain. I don't think we need to micro optimze further..

Yeah, let me try this.

Best regards,
baolu