Re: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present

From: Jason Gunthorpe
Date: Tue Jan 30 2024 - 11:33:10 EST


On Tue, Jan 30, 2024 at 04:15:33PM +0800, Ethan Zhao wrote:
> Some tricky situations:
>
> 1. The ATS invalidation request is issued from driver driver, while it is
> in handling, device is removed. this momment, the device instance still
> exists in the bus list. yes, if searching it by BDF, could get it.
>
> 2. The ATS invalidation request is issued from iommu_bus_notifier()
> for surprise removal reason, as shown in above calltrace, device was
> already removed from bus list. if searching it by BDF, return NULL.
>
> 3. The ATS invlidation request is issued from iommu_bus_notifier()
> for safe removal, when is in handling, device is removed or link
> is down. also as #2, device was already removed from bus list.
> if searching it by BDF. got NULL.
> ...
>
> so, searching device by BDF, only works for the ATS invalidation
> request is from device driver.

In the good path, where the hot removal is expected and this is about
coordinating, the IOMMU driver should do an orderly shutdown of the
ATS mechanism:

1 Write to PCI config space to disable the ATS
2 Make the IOMMU respond to ATS requests with UR and set it to BLOCKED
3 Issue a flush of the ATC
4 Wait for all outstanding ATC flushes to complete

If it is a bad/surprise path where the device is already gone then:

1 should automatically not do anything, possibly timing out
2 must succeed
3 should time out
4 should "complete" in that the ATC flushes are all timed out

IMHO all you need to do is not crash/lockup while processing the ATC
timeouts. If this is a surprise path then the ATC timeout might
already happened before the iommu driver remove notifier event happens.

If the driver needs to translate from the IOMMU device table index
into a struct device it is probably best to do that inside the driver.

eg ARM maintains a rbtree in the iommu dev data. (see
arm_smmu_insert_master)

Jason