RE: [RFC] /dev/ioasid uAPI proposal

From: Tian, Kevin
Date: Thu Jun 03 2021 - 22:16:14 EST


> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Thursday, June 3, 2021 7:47 PM
>
> On Thu, Jun 03, 2021 at 06:49:20AM +0000, Tian, Kevin wrote:
> > > From: David Gibson
> > > Sent: Thursday, June 3, 2021 1:09 PM
> > [...]
> > > > > In this way the SW mode is the same as a HW mode with an infinite
> > > > > cache.
> > > > >
> > > > > The collaposed shadow page table is really just a cache.
> > > > >
> > > >
> > > > OK. One additional thing is that we may need a 'caching_mode"
> > > > thing reported by /dev/ioasid, indicating whether invalidation is
> > > > required when changing non-present to present. For hardware
> > > > nesting it's not reported as the hardware IOMMU will walk the
> > > > guest page table in cases of iotlb miss. For software nesting
> > > > caching_mode is reported so the user must issue invalidation
> > > > upon any change in guest page table so the kernel can update
> > > > the shadow page table timely.
> > >
> > > For the fist cut, I'd have the API assume that invalidates are
> > > *always* required. Some bypass to avoid them in cases where they're
> > > not needed can be an additional extension.
> > >
> >
> > Isn't a typical TLB semantics is that non-present entries are not
> > cached thus invalidation is not required when making non-present
> > to present? It's true to both CPU TLB and IOMMU TLB. In reality
> > I feel there are more usages built on hardware nesting than software
> > nesting thus making default following hardware TLB behavior makes
> > more sense...
>
> From a modelling perspective it makes sense to have the most general
> be the default and if an implementation can elide certain steps then
> describing those as additional behaviors on the universal baseline is
> cleaner
>
> I'm surprised to hear your remarks about the not-present though,
> how does the vIOMMU emulation work if there are not hypervisor
> invalidation traps for not-present/present transitions?
>

Such invalidation traps matter only for shadow I/O page table (software
nesting). For hardware nesting no trap is required for non-present/
present transition since physical IOTLB doesn't cache non-present
entries. The IOMMU will walk the guest I/O page table in case of IOTLB
miss.

The vIOMMU should be constructed according to whether software
or hardware nesting is used. For Intel (and AMD iirc), a caching_mode
capability decides whether the guest needs to do invalidation for
non-present/present transition. Such vIOMMU should clear this bit
for hardware nesting or set it for software nesting. ARM SMMU doesn't
have this capability. Therefore their vSMMU can only work with a
hardware nested IOASID.

Thanks
Kevin