Re: [PATCH 1/1] iommu/vt-d: Add MTL to quirk list to skip TE disabling

From: Baolu Lu
Date: Thu Nov 16 2023 - 02:25:01 EST


On 2023/11/16 11:27, Tian, Kevin wrote:
From: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
Sent: Thursday, November 16, 2023 10:23 AM

From: "Abdul Halim, Mohd Syazwan"
<mohd.syazwan.abdul.halim@xxxxxxxxx>

The VT-d spec requires (10.4.4 Global Command Register, TE field) that:

Hardware implementations supporting DMA draining must drain any in-flight
DMA read/write requests queued within the Root-Complex before
completing
the translation enable command and reflecting the status of the command
through the TES field in the Global Status register.

this talks about 'enable'...


Unfortunately, some integrated graphic devices fail to do so after some
kind of power state transition. As the result, the system might stuck in
iommu_disable_translation(), waiting for the completion of TE transition.

...while this fixes 'disable'. wrong citation?

Right. It's confusing. I will change it to below.

"
...before switching address translation on or off and reflecting the
status of the command through the TES field in the Global Status
register.
"


@@ -5080,7 +5080,7 @@ static void quirk_igfx_skip_te_disable(struct
pci_dev *dev)
ver = (dev->device >> 8) & 0xff;
if (ver != 0x45 && ver != 0x46 && ver != 0x4c &&
ver != 0x4e && ver != 0x8a && ver != 0x98 &&
- ver != 0x9a && ver != 0xa7)
+ ver != 0x9a && ver != 0xa7 && ver != 0x7d)
return;


this fix alone is fine, but I found this quirk overall is not cleanly handled.

Basically it sets iommu_skip_te_disable as true, leading to early return
in iommu_disable_translation():

if (iommu_skip_te_disable && iommu->drhd->gfx_dedicated &&
(cap_read_drain(iommu->cap) || cap_write_drain(iommu->cap)))
return;

However the caller of iommu_disable_translation() is not aware of this
quirk and continues as if the iommu is disabled. IMHO this is problematic
w/o meeting the caller's assumption.

e.g. kdump and suspend. We may want to abort those paths early in case
of such quirk...

I can see your point.

This fix is just to add a new device model to the established quirk
list. All devices (including the new one) in this quirk list have
undergone thorough verification. Therefor, I'd like to keep it as-is.

We can refine the quirk implementation in a separated patch series with
sufficient consideration and verification. Does this work for you?

Best regards,
baolu