Re: [RFC] /dev/ioasid uAPI proposal

From: Jason Gunthorpe
Date: Tue Jun 08 2021 - 15:33:58 EST


On Tue, Jun 08, 2021 at 04:04:26PM +1000, David Gibson wrote:

> > What I would like is that the /dev/iommu side managing the IOASID
> > doesn't really care much, but the device driver has to tell
> > drivers/iommu what it is going to do when it attaches.
>
> By the device driver, do you mean the userspace or guest device
> driver? Or do you mean the vfio_pci or mdev "shim" device driver"?

I mean vfio_pci, mdev "shim", vdpa, etc. Some kernel driver that is
allowing qemu access to a HW resource.

> > It makes sense, in PCI terms, only the driver knows what TLPs the
> > device will generate. The IOMMU needs to know what TLPs it will
> > recieve to configure properly.
> >
> > PASID or not is major device specific variation, as is the ENQCMD/etc
> >
> > Having the device be explicit when it tells the IOMMU what it is going
> > to be sending is a major plus to me. I actually don't want to see this
> > part of the interface be made less strong.
>
> Ok, if I'm understanding this right a PASID capable IOMMU will be able
> to process *both* transactions with just a RID and transactions with a
> RID+PASID.

Yes

> So if we're thinking of this notional 84ish-bit address space, then
> that includes "no PASID" as well as all the possible PASID values.
> Yes? Or am I confused?

Right, though I expect how to model 'no pasid' vs all the pasids is
some micro-detail someone would need to work on a real vIOMMU
implemetnation to decide..

> > /dev/iommu is concerned with setting up the IOAS and filling the IO
> > page tables with information
> >
> > The driver behind "struct vfio_device" is responsible to "route" its
> > HW into that IOAS.
> >
> > They are two halfs of the problem, one is only the io page table, and one
> > the is connection of a PCI TLP to a specific io page table.
> >
> > Only the driver knows what format of TLPs the device will generate so
> > only the driver can specify the "route"
>
> Ok. I'd really like if we can encode this in a way that doesn't build
> PCI-specific structure into the API, though.

I think we should at least have bus specific "convenience" APIs for
the popular cases. It is complicated enough already, trying to force
people to figure out the kernel synonym for a PCI standard name gets
pretty rough... Plus the RID is inherently a hardware specific
concept.

> > Inability to match the RID is rare, certainly I would expect any IOMMU
> > HW that can do PCIEe PASID matching can also do RID matching.
>
> It's not just up to the IOMMU. The obvious case is a PCIe-to-PCI
> bridge.

Yes.. but PCI is *really* old at this point. Even PCI-X sustains the
originating RID.

The general case here is that each device can route to its own
IOAS. The specialty case is that only one IOAS in a group can be
used. We should make room in the API for the special case without
destroying the general case.

> > Oh, I hadn't spent time thinking about any of those.. It is messy but
> > it can still be forced to work, I guess. A device centric model means
> > all the devices using the same routing ID have to be connected to the
> > same IOASID by userspace. So some of the connections will be NOPs.
>
> See, that's exactly what I thought the group checks were enforcing.
> I'm really hoping we don't need two levels of granularity here: groups
> of devices that can't be identified from each other, and then groups
> of those that can't be isolated from each other. That introduces a
> huge amount of extra conceptual complexity.

We've got this far with groups that mean all those things together, I
wouldn't propose to do a bunch of kernel work to change that
significantly.

I just want to have a device centric uAPI so we are not trapped
forever in groups being 1:1 with an IOASID model, which is clearly not
accurately modeling what today's systems are actually able to do,
especially with PASID.

We can report some fixed info to user space 'all these devices share
one ioasid' and leave it for now/ever

Jason