Re: [PATCH v6 0/6] iommufd: Add nesting infrastructure (part 2/2)

From: Jason Gunthorpe
Date: Tue Dec 12 2023 - 14:21:09 EST


On Tue, Dec 12, 2023 at 11:13:37AM -0800, Nicolin Chen wrote:
> On Tue, Dec 12, 2023 at 10:44:21AM -0400, Jason Gunthorpe wrote:
> > On Mon, Dec 11, 2023 at 11:30:00PM -0800, Nicolin Chen wrote:
> >
> > > > > Could the structure just look like this?
> > > > > struct iommu_dev_assign_virtual_id {
> > > > > __u32 size;
> > > > > __u32 dev_id;
> > > > > __u32 id_type;
> > > > > __u32 id;
> > > > > };
> > > >
> > > > It needs to take in the viommu_id also, and I'd make the id 64 bits
> > > > just for good luck.
> > >
> > > What is viommu_id required for in this context? I thought we
> > > already know which SMMU instance to issue commands via dev_id?
> >
> > The viommu_id would be the container that holds the xarray that maps
> > the vRID to pRID
> >
> > Logically we could have multiple mappings per iommufd as we could have
> > multiple iommu instances working here.
>
> I see. This is the object to hold a shared stage-2 HWPT/domain then.

It could be done like that, yes. I wasn't thinking about linking the
stage two so tightly but perhaps? If we can avoid putting the hwpt
here that might be more general.

> // iommufd_private.h
>
> enum iommufd_object_type {
> ...
> + IOMMUFD_OBJ_VIOMMU,
> ...
> };
>
> +struct iommufd_viommu {
> + struct iommufd_object obj;
> + struct iommufd_hwpt_paging *hwpt;
> + struct xarray devices;
> +};
>
> struct iommufd_hwpt_paging hwpt {
> ...
> + struct list_head viommu_list;
> ...
> };

I'd probably first try to go backwards and link the hwpt to the
viommu.

> struct iommufd_group {
> ...
> + struct iommufd_viommu *viommu; // should we attach to viommu instead of hwpt?
> ...
> };

No. Attach is a statement of translation so you still attach to the HWPT.


> Question to finalize how we maps vRID-pRID in the xarray:
> how should IOMMUFD_DEV_INVALIDATE work? The ioctl structure has
> a dev_id and a list of commands that belongs to the device. So,
> it forwards the struct device pointer to the driver along with
> the commands. Then, doesn't the driver already know the pRID
> from the dev pointer without looking up a vRID-pRID table?

The first version of DEV_INVALIDATE should have no xarray. The
invalidate commands are stripped of the SID and executed on the given
dev_id period. VMM splits up the invalidate command list.

The second version maybe we have the xarray, or maybe we just push the
xarray to the eventual viommu series.

> struct iommu_hwpt_alloc {
> ...
> + __u32 viommu_id;
> };
>
> +enum iommu_dev_virtual_id_type {
> + IOMMU_DEV_VIRTUAL_ID_TYPE_AMD_VIOMMU_DID, // not sure how this fits the xarray in viommu obj.
> + IOMMU_DEV_VIRTUAL_ID_TYPE_AMD_VIOMMU_RID,

It is just DID. In both cases the ID is the index to the "STE" radix
tree, whatever the driver happens to call it.

> Then, I think that we also need an iommu_viommu_alloc structure
> and ioctl to allocate an object, and that VMM should know if it
> needs to allocate multiple viommu objects -- this probably needs
> the hw_info ioctl to return a piommu_id so VMM gets the list of
> piommus from the attached devices?

Yes and yes

> Another question, introducing the viommu obj complicates things
> a lot. Do we want it to come with iommu_dev_assign_virtual_id,
> or maybe put in a later series? We could stage the xarray in the
> iommu_hwpt_paging struct for now, so a single-IOMMU system could
> still work with that.

All this would be in its own series to enable HW accelerated viommu
support on ARM & AMD as we've been doing so far.

I imagine it after we get the basic invalidation done

> > > And should we rename the "cache_invalidate_user"? Would VT-d
> > > still uses it for device cache?
> >
> > I think vt-d will not implement it
>
> Then should we "s/cache_invalidate_user/iotlb_sync_user"?

I think cache_invalidate is still a fine name.. vt-d will generate ATC
invalidations under that function too.

Jason