RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

From: Tian, Kevin
Date: Wed Apr 28 2021 - 02:34:50 EST


> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Monday, April 26, 2021 8:38 PM
>
[...]
> > Want to hear your opinion for one open here. There is no doubt that
> > an ioasid represents a HW page table when the table is constructed by
> > userspace and then linked to the IOMMU through the bind/unbind
> > API. But I'm not very sure about whether an ioasid should represent
> > the exact pgtable or the mapping metadata when the underlying
> > pgtable is indirectly constructed through map/unmap API. VFIO does
> > the latter way, which is why it allows multiple incompatible domains
> > in a single container which all share the same mapping metadata.
>
> I think VFIO's map/unmap is way too complex and we know it has bad
> performance problems.

Can you or Alex elaborate where the complexity and performance problem
locate in VFIO map/umap? We'd like to understand more detail and see how
to avoid it in the new interface.

>
> If /dev/ioasid is single HW page table only then I would focus on that
> implementation and leave it to userspace to span different
> /dev/ioasids if needed.
>
> > OK, now I see where the disconnection comes from. In my context ioasid
> > is the identifier that is actually used in the wire, but seems you treat it as
> > a sw-defined namespace purely for representing page tables. We should
> > clear this concept first before further discussing other details. 😊
>
> There is no general HW requirement that every IO page table be
> referred to by the same PASID and this API would have to support

Yes, but what is the value of allowing multiple PASIDs referring to the
the same I/O page table (except the nesting pgtable case)? Doesn't it
lead to poor iotlb efficiency issue similar to multiple iommu domains
referring to the same page table?

> non-PASID IO page tables as well. So I'd keep the two things
> separated in the uAPI - even though the kernel today has a global
> PASID pool.

for non-PASID usages the allocated PASID will be wasted if we don't
separate ioasid from pasid. But it may be worthwhile given 1m available
pasids and the simplification in the uAPI which only needs to care about
one id space then.

>
> > Then following your proposal, does it mean that we need another
> > interface for allocating PASID? and since ioasid means different
> > thing in uAPI and in-kernel API, possibly a new name is required to
> > avoid confusion?
>
> I would suggest have two ways to control the PASID
>
> 1) Over /dev/ioasid allocate a PASID for an IOASID. All future PASID
> based usages of the IOASID will use that global PASID
>
> 2) Over the device FD, when the IOASID is bound return the PASID that
> was selected. If the IOASID does not have a global PASID then the
> kernel is free to make something up. In this mode a single IOASID
> can have multiple PASIDs.
>
> Simple things like DPDK can use #2 and potentially have better PASID
> limits. hypervisors will most likely have to use #1, but it depends on
> how their vIOMMU interface works.

Can you elaborate why DPDK wants to use #2 i.e. not using a global
PASID?

>
> I think the name IOASID is fine for the uAPI, the kernel version can
> be called ioasid_id or something.

ioasid is already an id and then ioasid_id just adds confusion. Another
point is that ioasid is currently used to represent both PCI PASID and
ARM substream ID in the kernel. It implies that if we want to separate
ioasid and pasid in the uAPI the 'pasid' also needs to be replaced with
another general term usable for substream ID. Are we making the
terms too confusing here?

>
> (also looking at ioasid.c, why do we need such a thin and odd wrapper
> around xarray?)
>

I'll leave it to Jean and Jacob.

Thanks
Kevin