Re: [RFC] /dev/ioasid uAPI proposal

From: David Gibson
Date: Thu Jun 03 2021 - 02:28:34 EST


On Wed, Jun 02, 2021 at 01:16:48PM -0300, Jason Gunthorpe wrote:
> On Wed, Jun 02, 2021 at 04:32:27PM +1000, David Gibson wrote:
> > > I agree with Jean-Philippe - at the very least erasing this
> > > information needs a major rational - but I don't really see why it
> > > must be erased? The HW reports the originating device, is it just a
> > > matter of labeling the devices attached to the /dev/ioasid FD so it
> > > can be reported to userspace?
> >
> > HW reports the originating device as far as it knows. In many cases
> > where you have multiple devices in an IOMMU group, it's because
> > although they're treated as separate devices at the kernel level, they
> > have the same RID at the HW level. Which means a RID for something in
> > the right group is the closest you can count on supplying.
>
> Granted there may be cases where exact fidelity is not possible, but
> that doesn't excuse eliminating fedelity where it does exist..
>
> > > If there are no hypervisor traps (does this exist?) then there is no
> > > way to involve the hypervisor here and the child IOASID should simply
> > > be a pointer to the guest's data structure that describes binding. In
> > > this case that IOASID should claim all PASIDs when bound to a
> > > RID.
> >
> > And in that case I think we should call that object something other
> > than an IOASID, since it represents multiple address spaces.
>
> Maybe.. It is certainly a special case.
>
> We can still consider it a single "address space" from the IOMMU
> perspective. What has happened is that the address table is not just a
> 64 bit IOVA, but an extended ~80 bit IOVA formed by "PASID, IOVA".

True. This does complexify how we represent what IOVA ranges are
valid, though. I'll bet you most implementations don't actually
implement a full 64-bit IOVA, which means we effectively have a large
number of windows from (0..max IOVA) for each valid pasid. This adds
another reason I don't think my concept of IOVA windows is just a
power specific thing.

> If we are already going in the direction of having the IOASID specify
> the page table format and other details, specifying that the page
> tabnle format is the 80 bit "PASID, IOVA" format is a fairly small
> step.

Well, rather I think userspace needs to request what page table format
it wants and the kernel tells it whether it can oblige or not.

> I wouldn't twist things into knots to create a difference, but if it
> is easy to do it wouldn't hurt either.
>
> Jason
>

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature