Re: [RFC PATCH v10 4/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected

From: Ethan Zhao
Date: Wed Jan 10 2024 - 23:16:53 EST



On 1/11/2024 10:24 AM, Baolu Lu wrote:
On 1/10/24 4:37 PM, Ethan Zhao wrote:

On 1/10/2024 1:24 PM, Baolu Lu wrote:
On 12/29/23 1:05 AM, Ethan Zhao wrote:
Except those aggressive hotplug cases - surprise remove a hotplug device
while its safe removal is requested and handled in process by:

1. pull it out directly.
2. turn off its power.
3. bring the link down.
4. just died there that moment.

etc, in a word, 'gone' or 'disconnected'.

Mostly are regular normal safe removal and surprise removal unplug.
these hot unplug handling process could be optimized for fix the ATS
Invalidation hang issue by calling pci_dev_is_disconnected() in function
devtlb_invalidation_with_pasid() to check target device state to avoid
sending meaningless ATS Invalidation request to iommu when device is gone.
(see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)

For safe removal, device wouldn't be removed untill the whole software
handling process is done, it wouldn't trigger the hard lock up issue
caused by too long ATS Invalidation timeout wait. In safe removal path,
device state isn't set to pci_channel_io_perm_failure in
pciehp_unconfigure_device() by checking 'presence' parameter, calling
pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
false there, wouldn't break the function.

For surprise removal, device state is set to pci_channel_io_perm_failure in
pciehp_unconfigure_device(), means device is already gone (disconnected)
call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
return true to break the function not to send ATS Invalidation request to
the disconnected device blindly, thus avoid the further long time waiting
triggers the hard lockup.

safe removal & surprise removal

pciehp_ist()
    pciehp_handle_presence_or_link_change()
      pciehp_disable_slot()
        remove_board()
          pciehp_unconfigure_device(presence)

Tested-by: Haorong Ye <yehaorong@xxxxxxxxxxxxx>
Signed-off-by: Ethan Zhao <haifeng.zhao@xxxxxxxxxxxxxxx>
---
  drivers/iommu/intel/pasid.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 715943531091..3d5ed27f39ef 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -480,6 +480,8 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
      if (!info || !info->ats_enabled)
          return;
  +    if (pci_dev_is_disconnected(to_pci_dev(dev)))
+        return;

Why do you need the above after changes in PATCH 2/5? It's unnecessary
and not complete. We have other places where device TLB invalidation is
issued, right?

This one could be regarded as optimization, no need to trapped into rabbit

hole if we could predict the result. because the bad thing is we don't know

what response to us in the rabbit hole from third party switch (bridges will

feedback timeout to requester as PCIe spec mentioned if the endpoint is

gone).

The IOMMU hardware has its own timeout mechanism. This timeout might
happen if:

1) The link to the endpoint is broken, so the invalidation completion
   message is lost on the way.
2) The device has a longer timeout value, so the device is still busy
   with handling the cache invalidation when IOMMU's timeout is
   triggered.

Here, we are doing the following:

For Case 1, we return -ETIMEDOUT directly. For Case 2, we attempt to
retry.

Yes, Intel VT-d will setup a hardware timer if devtlb invalidation issued and

wait descripton submitted, that hardware timer is limited resource, will tick

till gets the timeout if the endpoint is dead/broken etc.

even we bail out in qi_submit_sync() for case #1, the hardware timer still

ticks there, if many of such request issued, the iommu will run out of

hardware resouce.  so we should avoid such case as possible as we could.

though the Intel VT-d says the timeout value will not more than "

PCIe read timeout", but in fact, we got more than 12 seconds before get

ITE.


for case #2, the retry has pre-conditon as I know, there is fault, cleared.

So I call it "rabbit hole".

To run into that rabbit hole is last choice, not best.


Thanks,

Ethan


Best regards,
baolu