Re: [PATCH v7 1/3] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation

From: Yi Liu
Date: Tue Jan 09 2024 - 00:57:47 EST


On 2024/1/8 12:07, Tian, Kevin wrote:
From: Jason Gunthorpe <jgg@xxxxxxxxxx>
Sent: Friday, January 5, 2024 10:45 PM

On Fri, Jan 05, 2024 at 02:52:50AM +0000, Tian, Kevin wrote:
but in reality the relation could be identified in an easy way due to a SIOV
restriction which we discussed before - shared PASID space of PF
disallows
assigning sibling vdev's to a same VM (otherwise no way to identify which
sibling vdev triggering an iopf when a pasid is used on both vdev's). That
restriction implies that within an iommufd context every iommufd_device
object should contain a unique struct device pointer. So PASID can be
instead ignored in the lookup then just always do iommufd_get_dev_id()
using struct device.

A bit more background.

Previously we thought this restriction only applies to SIOV+vSVA, as
a guest process may bind to both sibling vdev's, leading to the same
pasid situation.

In concept w/o vSVA it's still possible to assign sibling vdev's to
a same VM as each vdev is allocated with a unique pasid to mark vRID
so can be differentiated from each other in the fault/error path.

I thought the SIOV plan was that each "vdev" ie vpci function would
get a slice of the pRID's PASID space statically selected at creation?

So SVA/etc doesn't matter, you reliably get a disjoint set of pRID &
pPASID into each VM.

From that view you can't identify the iommufd dev_id without knowing
both the pRID and pPASID which will disambiguate the different SIOV
iommufd dev_id instances sharing a rid.

true when assigning those instances to different VMs.

Here I was talking about assigning them to a same VM being a problem.
with rid sharing plus same ENQCMD pPASID potentially used on both
instances there'd be ambiguity in vSVA e.g. iopf to identify dev_id.

we agreed before on preventing sibling vdev's in one VM for above
reason, but only as far as vSVA is concerned.

then given the new finding in err reporting I wondered whether this
restriction should be applied to all SIOV scenarios (but irrelevant now
with below explanation after another thinking)


But when looking at this err code issue with Yi closely, we found
there is another gap in the VT-d spec. Upon devtlb invalidation
timeout the hw doesn't report pasid in the error info register. this
makes it impossible to identify the source vdev if a hwpt invalidation
request involves sibling vdev's from a same PF.

Don't you know which command timed out?

unfortunately no.

for errors related to descriptor fetch the driver can tell the command
by looking at the head pointer of the invalidation queue.

command completion is indirectly detected by inserting a wait descriptor
as fence. completion timeout error is reported in an error register. but
this register doesn't record pasid, nor does the command location. if there
are multiple pending devtlb invalidation commands upon timeout
error the spec suggests the driver to treat all of them timeout as the
register can only record one rid.

this is kind of moot. If the driver submits only one command (plus wait)
at a time it doesn't need hw's help to identify the timeout command.
If the driver batches invalidation commands it must treat all timeout if
an timeout error is reported.

from this angle whether to record pasid doesn't really matter.

intel-iommu driver doesn't batch commands. so it's possible for
the driver to figure out the timeout device itself and identify rid plus
pasid to find dev_id from iommufd.

based on this, even RID is unnecessary. Software should know which device
it has sent a devTLB invalidation.

--
Regards,
Yi Liu