Re: [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path

From: Baolu Lu
Date: Sun Feb 18 2024 - 02:02:17 EST


On 2024/2/16 1:55, Jason Gunthorpe wrote:
On Thu, Feb 15, 2024 at 03:22:49PM +0800, Lu Baolu wrote:
The existing IO page fault handler currently locates the PCI device by
calling pci_get_domain_bus_and_slot(). This function searches the list
of all PCI devices until the desired device is found. To improve lookup
efficiency, a helper function named device_rbtree_find() is introduced
to search for the device within the rbtree. Replace
pci_get_domain_bus_and_slot() in the IO page fault handling path.

Co-developed-by: Huang Jiaqing <jiaqing.huang@xxxxxxxxx>
Signed-off-by: Huang Jiaqing <jiaqing.huang@xxxxxxxxx>
Signed-off-by: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
---
drivers/iommu/intel/iommu.h | 1 +
drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++
drivers/iommu/intel/svm.c | 14 ++++++--------
3 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 54eeaa8e35a9..f13c228924f8 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr);
void iommu_flush_write_buffer(struct intel_iommu *iommu);
struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
const struct iommu_user_data *user_data);
+struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid);
#ifdef CONFIG_INTEL_IOMMU_SVM
void intel_svm_check(struct intel_iommu *iommu);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 09009d96e553..d92c680bcc96 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs)
return device_rid_cmp_key(&key, rhs);
}
+/*
+ * Looks up an IOMMU-probed device using its source ID.
+ *
+ * If the device is found:
+ * - Increments its reference count.
+ * - Returns a pointer to the device.
+ * - The caller must call put_device() after using the pointer.
+ *
+ * If the device is not found, returns NULL.
+ */
+struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid)
+{
+ struct device_domain_info *info;
+ struct device *dev = NULL;
+ struct rb_node *node;
+ unsigned long flags;
+
+ spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
+ node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key);
+ if (node) {
+ info = rb_entry(node, struct device_domain_info, node);
+ dev = info->dev;
+ get_device(dev);

This get_device() is a bit troubling. It eventually calls into
iommu_report_device_fault() which does:

struct dev_iommu *param = dev->iommu;

Which is going to explode if the iomm driver release has already
happened, which is a precondition to getting to a unref'd struct
device.

The driver needs to do something to fence these events during it's
release function.

Yes, theoretically the dev->iommu should be protected in the
iommu_report_device_fault() path.


If we are already doing that then I'd suggest to drop the get_device
and add a big fat comment explaining the special rules about lifetime
that are in effect here.

Otherwise you need to do that barrier rethink the way the locking
works..

A device hot removing goes through at least the following steps:

- Disable PRI.
- Drain all outstanding I/O page faults.
- Stop DMA.
- Unload the device driver.
- Call iommu_release_device() upon the BUS_NOTIFY_REMOVED_DEVICE event.

This sequence ensures that a device cannot generate an I/O page fault
after PRI has been disabled. So in reality it's impossible for a device
to generate an I/O page fault before disabling PRI and then go through
the long journey to reach iommu_release_device() before
iopf_get_dev_fault_param() is called in page fault interrupt handling
thread.

Considering this behavior, adding a comment to the code explaining the
sequence and removing put_device() may be a simpler solution?


Aside from that this looks like a great improvement to me

Thanks,
Jason

Best regards,
baolu