Re: [PATCH v8 07/10] iommu/vt-d: Allow qi_submit_sync() to return the QI faults

From: Ethan Zhao
Date: Thu Jan 11 2024 - 02:14:33 EST



On 1/1/2024 11:34 AM, Baolu Lu wrote:
On 12/28/23 2:17 PM, Tian, Kevin wrote:
raw_spin_lock_irqsave(&qi->q_lock, flags);
      /*
@@ -1430,7 +1439,7 @@ int qi_submit_sync(struct intel_iommu *iommu,
struct qi_desc *desc,
           * a deadlock where the interrupt context can wait
indefinitely
           * for free slots in the queue.
           */
-        rc = qi_check_fault(iommu, index, wait_index);
+        rc = qi_check_fault(iommu, index, wait_index, fault);
          if (rc)
              break;
and as replied in another thread let's change qi_check_fault to return
-ETIMEDOUT to break the restart loop when fault pointer is valid.

It's fine to break the retry loop when fault happens and the fault
pointer is valid. Please don't forget to add an explanation comment
around the code. Something like:

/*
 * The caller is able to handle the fault by itself. The IOMMU driver
 * should not attempt to retry this request.
 */

If caller could pass desc with mixed iotlb & devtlb invalidation request,

it would be problematic/difficult for caller or qi_submit_sync() to do

error handling, imagine a case like,

1. call qi_submit_sync() with iotlb & devltb.

2. qi_submit_sync() detects the target device is dead.

3.  break the loop, or will block other invalidation submitter / hang.

4. it is hard for qi_submit_sync() to extract those iotlb invalidation to retry.

5. it is also difficult for caller to retry the iotlb invalidation, or

    leave iotlb out-of-sync. ---there is no sync at all, device is gone.

and if only ITE fault hit, but target device is there && configuration

space reading okay, the ITE is probably left by previous request for

other device, not triggered by this batch, the question is we couldn't

identify the ITE device is just the same as current target ? if the same,

then breaking out is reasonable, or just leave the problem to caller,

something in the request batch is bad, some requests someone request

befoere is bad, but the request is not from the same caller.


Thanks,

Ethan



Best regards,
baolu