Re: [RFC 11/20] iommu/iommufd: Add IOMMU_IOASID_ALLOC/FREE

From: Jason Gunthorpe
Date: Fri Oct 01 2021 - 08:22:32 EST


On Fri, Oct 01, 2021 at 04:13:58PM +1000, David Gibson wrote:
> On Tue, Sep 21, 2021 at 02:44:38PM -0300, Jason Gunthorpe wrote:
> > On Sun, Sep 19, 2021 at 02:38:39PM +0800, Liu Yi L wrote:
> > > This patch adds IOASID allocation/free interface per iommufd. When
> > > allocating an IOASID, userspace is expected to specify the type and
> > > format information for the target I/O page table.
> > >
> > > This RFC supports only one type (IOMMU_IOASID_TYPE_KERNEL_TYPE1V2),
> > > implying a kernel-managed I/O page table with vfio type1v2 mapping
> > > semantics. For this type the user should specify the addr_width of
> > > the I/O address space and whether the I/O page table is created in
> > > an iommu enfore_snoop format. enforce_snoop must be true at this point,
> > > as the false setting requires additional contract with KVM on handling
> > > WBINVD emulation, which can be added later.
> > >
> > > Userspace is expected to call IOMMU_CHECK_EXTENSION (see next patch)
> > > for what formats can be specified when allocating an IOASID.
> > >
> > > Open:
> > > - Devices on PPC platform currently use a different iommu driver in vfio.
> > > Per previous discussion they can also use vfio type1v2 as long as there
> > > is a way to claim a specific iova range from a system-wide address space.
> > > This requirement doesn't sound PPC specific, as addr_width for pci devices
> > > can be also represented by a range [0, 2^addr_width-1]. This RFC hasn't
> > > adopted this design yet. We hope to have formal alignment in v1 discussion
> > > and then decide how to incorporate it in v2.
> >
> > I think the request was to include a start/end IO address hint when
> > creating the ios. When the kernel creates it then it can return the
> > actual geometry including any holes via a query.
>
> So part of the point of specifying start/end addresses is that
> explicitly querying holes shouldn't be necessary: if the requested
> range crosses a hole, it should fail. If you didn't really need all
> that range, you shouldn't have asked for it.
>
> Which means these aren't really "hints" but optionally supplied
> constraints.

We have to be very careful here, there are two very different use
cases. When we are talking about the generic API I am mostly
interested to see that applications like DPDK can use this API and be
portable to any IOMMU HW the kernel supports. I view the fact that
there is VFIO PPC specific code in DPDK as a failing of the kernel to
provide a HW abstraction.

This means we cannot define an input that has a magic HW specific
value. DPDK can never provide that portably. Thus all these kinds of
inputs in the generic API need to be hints, if they exist at all.

As 'address space size hint'/'address space start hint' is both
generic, useful, and providable by DPDK I think it is OK. PPC can use
it to pick which of the two page table formats to use for this IOAS if
it wants.

The second use case is when we have a userspace driver for a specific
HW IOMMU. Eg a vIOMMU in qemu doing specific PPC/ARM/x86 acceleration.
We can look here for things to make general, but I would expect a
fairly high bar. Instead, I would rather see the userspace driver
communicate with the kernel driver in its own private language, so
that the entire functionality of the unique HW can be used.

So, when it comes to providing exact ranges as an input parameter we
have to decide if that is done as some additional general data, or if
it should be part of a IOAS_FORMAT_KERNEL_PPC. In this case I suggest
the guiding factor should be if every single IOMMU implementation can
be updated to support the value.

Jason