Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure

From: Jason Gunthorpe
Date: Mon Jun 19 2023 - 08:37:16 EST


On Fri, Jun 16, 2023 at 02:43:13AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Tuesday, June 6, 2023 10:18 PM
> >
> > > > > It's not necessarily to add reserved regions to the IOAS of the parent
> > > > > hwpt since the device doesn't access that address space after it's
> > > > > attached to stage-1. The parent is used only for address translation
> > > > > in the iommu side.
> > > >
> > > > But if we don't put them in the IOAS of the parent there is no way for
> > > > userspace to learn what they are to forward to the VM ?
> > >
> > > emmm I wonder whether that is the right interface to report
> > > per-device reserved regions.
> >
> > The iommu driver needs to report different reserved regions for the S1
> > and S2 iommu_domains,
>
> I can see the difference between RID and RID+PASID, but not sure whether
> it's a actual requirement regarding to attached domain.

No, it isn't RID or RID+PASID here

The S2 has a different set of reserved regsions than the S1 because
the S2's IOVA does not appear on the bus.

So the S2's reserved regions are entirely an artifact of how the IOMMU
HW itself works when nesting.

We can probably get by with some documented slightly messy rules that
the reserved_regions only applies to directly RID attached domains. S2
and PASID attachments always have no reserved spaces.

> When talking about RID-based nesting alone, ARM needs to add reserved
> regions to the parent IOAS as identity is a valid S1 mode in nesting.

No, definately not. The S2 has no reserved regions because it is an
internal IOVA, and we should not abuse that.

Reflecting the requirements for an identity map is something all iommu
HW needs to handle, we should figure out how to do that properly.

> But for Intel RID nesting excludes identity (which becomes a direct
> attach to S2) so the reserved regions apply to S1 instead of the parent IOAS.

IIRC all the HW models will assign their S2's as a RID attached "S1"
during boot time to emulate "no translation"?

They all need to learn what the allowed identiy mapping is so that the
VMM can construct a compatible guest address space, independently of
any IOAS restrictions.

Jason