RE: [PATCH v7 1/3] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation

From: Tian, Kevin
Date: Thu Jan 04 2024 - 21:53:13 EST


> From: Tian, Kevin
> Sent: Friday, January 5, 2024 10:16 AM
>
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Thursday, January 4, 2024 10:37 PM
> >
> > On Thu, Dec 14, 2023 at 07:26:39PM +0800, Yi Liu wrote:
> > > Per the prior discussion[1], we agreed to move the error reporting into
> the
> > > driver specific part. On Intel side, we want to report two devTLB
> > > invalidation errors: ICE (invalid completion error) and ITE (invalidation
> > > timeout error). Such errors have an additional SID information to tell
> > > which device failed the devTLB invalidation. I've got the below structure.
> >
> > IMHO all of this complexity is a consequence of the decision to hide
> > the devtlb invalidation from the VM..
> >
> > On the other hand I guess you want to do this because of the SIOV
> > troubles where the vPCI function in the VM is entirely virtual and
> > can't be trivially mapped to a real PCI function for ATC invalidation
> > like ARM and AMD can do (but they also can't support SIOV because of
> > this). :(
> >
> > However it also makes it very confusing about how the VM would
> > perceive an error - eg if it invalidates an SIOV device single PASID
> > and that devtlb fails then the error should be connected back to the
> > vPCI function for the SIOV's specific PASID and not back to the
> > physical PCI function for the SIOV owner.
> >
> > As the iommu driver itself has no idea about the vPCI functions this
> > seems like it is going to get really confusing. The API I suggested in
> > the other email is not entirely going to work as the vPCI function for
> > SIOV cases will have to be identified by the (struct device, PASID) -
> > while it would be easy enough for the iommu driver to provide the
> > PASID, I'm not sure how the iommufd core will relate the PASID back to
> > the iommu_device to understand SIOV without actually being aware of
> > SIOV to some degree :\
>
> we plan to add such awareness with a new binding helper:
>
> https://lore.kernel.org/lkml/20231009085123.463179-4-yi.l.liu@xxxxxxxxx/
>
> and with metadata to track association between PASID's and iommufd vdev.
>
> but in reality the relation could be identified in an easy way due to a SIOV
> restriction which we discussed before - shared PASID space of PF disallows
> assigning sibling vdev's to a same VM (otherwise no way to identify which
> sibling vdev triggering an iopf when a pasid is used on both vdev's). That
> restriction implies that within an iommufd context every iommufd_device
> object should contain a unique struct device pointer. So PASID can be
> instead ignored in the lookup then just always do iommufd_get_dev_id()
> using struct device.

A bit more background.

Previously we thought this restriction only applies to SIOV+vSVA, as
a guest process may bind to both sibling vdev's, leading to the same
pasid situation.

In concept w/o vSVA it's still possible to assign sibling vdev's to
a same VM as each vdev is allocated with a unique pasid to mark vRID
so can be differentiated from each other in the fault/error path.

But when looking at this err code issue with Yi closely, we found
there is another gap in the VT-d spec. Upon devtlb invalidation
timeout the hw doesn't report pasid in the error info register. this
makes it impossible to identify the source vdev if a hwpt invalidation
request involves sibling vdev's from a same PF.

with that I'm inclined to always imposing this restriction for SIOV.
One may argue that SIOV w/o vSVA w/o devtlb is conceptually immune
but I'm with you that given SIOVr1 is one-off I prefer to limiting its
usability other than complexing the kernel.