RE: [PATCH v7 1/3] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation

From: Tian, Kevin
Date: Thu Dec 14 2023 - 20:50:31 EST


> From: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> Sent: Thursday, December 14, 2023 7:27 PM
>
> On 2023/11/17 21:18, Yi Liu wrote:> This adds the data structure for
> flushing iotlb for the nested domain
>
> +struct iommu_hwpt_vtd_s1_invalidate {
> + __aligned_u64 addr;
> + __aligned_u64 npages;
> + __u32 flags;
> + __u32 __reserved;
> + __u32 error;
> + __u32 dev_id;
> +};
>
> dev_id is used to report the failed device, userspace should be able to map
> it to a vRID, and inject it to VM as part of ITE/ICE error.
>
> However, I got a problem when trying to get dev_id in cache invalidation
> path, since this is filled in intel iommu driver. Seems like there is no
> good way for it. I've below alternatives to move forward, wish you have
> a look.
>
> - Reporting pSID instead of dev_id. This may not work if userspace for
> example Qemu cen get a vfio device cdev fd from management stack. Maybe
> you
> have different opinion, do let me know.

yes, there is no guarantee that pRID is always visible to the user.

>
> - Let iommufd to convert a SID info or device pointer to a dev_id, and then
> report it back to userspace. This seems easiest, but breaks layer and also
> requires vt-d specific logic. :(

yes, the current philosophy of iommufd is to put diver specific knowledge
out of iommufd.

>
> - Reuse Nicolin's vRID->pRID mapping. If thevRID->pRID mapping is
> maintained, then intel iommu can report a vRID back to user. But intel
> iommu driver does not have viommu context, no place to hold the vRID-
> >pRID
> mapping. TBH. It may require other reasons to introduce it other than the
> error reporting need. Anyhow, this requires more thinking and also has
> dependency even if it is doable in intel side.

this sounds like a cleaner way to inject knowledge which iommu driver
requires to find out the user tag. but yes it's a bit weird to introduce
viommu awareness in intel iommu driver when there is no such thing
in real hardware.

and for this error reporting case what we actually require is the
reverse map i.e. pRID->vRID. Not sure whether we can leverage the
same RID mapping uAPI as for ARM/AMD but ignore viommu_id
and then store vRID under device_domain_info. a bit tricky on
life cycle management and also incompatible with SIOV...

let's see whether Jason has a better idea here.

>
> - Only report error code, but no device info at first. May adding the
> device info (dev_id or vRID) in future series. In reality, the existing
> Qemu vIOMMU does not report ICE, ITE, neither the SID to VM. Also, VT-d

and IOAS_UNMAP doesn't provide such ATS error either.

> spec defined the ICE/ITE errors first in 2007 spec 1.1, and added SID info
> later in 2019 spec 3.1. We may do it in stage as well.

and it's not tied to a specific iommu version. the spec is stated in
a way that software treats zero value in SID as no hw support so
theoretically even a modern hw may not report SID for certain
reason.

>
> What about your opinion?
>
> [1]
> https://lore.kernel.org/linux-iommu/a9699f71-805a-4a5a-9282-
> 3ec52e5bc81a@xxxxxxxxx/

I'm fine with this staged approach given the spec allows this
behavior and no vIOMMU properly emulates ITE/ICE today.

let's work out a new version w/o dev info and make sure it's
in a good state first. Then let Jason decide next week whether
he wants to take it for this cycle or not.