Re: [PATCH v2 1/3] iommu/uapi: Define uapi version and capabilities

From: Alex Williamson
Date: Mon Apr 13 2020 - 18:21:56 EST


On Mon, 13 Apr 2020 13:41:57 -0700
Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> wrote:

> Hi All,
>
> Just a gentle reminder, any feedback on the options I listed below? New
> ideas will be even better.
>
> Christoph, does the explanation make sense to you? We do have the
> capability/flag based scheme for IOMMU API extension, the version is
> mainly used for size lookup. Compatibility checking is another use of
> the version, it makes checking easy when a vIOMMU is launched.
>
> Thanks,
>
> Jacob
>
> On Thu, 2 Apr 2020 11:36:04 -0700
> Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> wrote:
>
> > On Wed, 1 Apr 2020 05:32:21 +0000
> > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> >
> > > > From: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > > > Sent: Tuesday, March 31, 2020 11:55 PM
> > > >
> > > > On Tue, 31 Mar 2020 06:06:38 +0000
> > > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> > > >
> > > > > > From: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > > > > > Sent: Tuesday, March 31, 2020 12:08 AM
> > > > > >
> > > > > > On Mon, 30 Mar 2020 05:40:40 +0000
> > > > > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> > > > > >
> > > > > > > > From: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > > > > > > > Sent: Saturday, March 28, 2020 7:54 AM
> > > > > > > >
> > > > > > > > On Fri, 27 Mar 2020 00:47:02 -0700
> > > > > > > > Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > > On Fri, Mar 27, 2020 at 02:49:55AM +0000, Tian, Kevin
> > > > > > > > > wrote:
> > > > > > > > > > If those API calls are inter-dependent for composing a
> > > > > > > > > > feature (e.g. SVA), shouldn't we need a way to check
> > > > > > > > > > them together before exposing the feature to the
> > > > > > > > > > guest, e.g. through a iommu_get_uapi_capabilities
> > > > > > > > > > interface?
> > > > > > > > >
> > > > > > > > > Yes, that makes sense. The important bit is to have a
> > > > > > > > > capability flags and not version numbers.
> > > > > > > >
> > > > > > > > The challenge is that there are two consumers in the
> > > > > > > > kernel for this. 1. VFIO only look for compatibility, and
> > > > > > > > size of each data struct such that it can copy_from_user.
> > > > > > > >
> > > > > > > > 2. IOMMU driver, the "real consumer" of the content.
> > > > > > > >
> > > > > > > > For 2, I agree and we do plan to use the capability flags
> > > > > > > > to check content and maintain backward compatibility etc.
> > > > > > > >
> > > > > > > > For VFIO, it is difficult to do size look up based on
> > > > > > > > capability flags.
> > > > > > >
> > > > > > > Can you elaborate the difficulty in VFIO? if, as Christoph
> > > > > > > Hellwig pointed out, version number is already avoided
> > > > > > > everywhere, it is interesting to know whether this work
> > > > > > > becomes a real exception or just requires a different
> > > > > > > mindset.
> > > > > > From VFIO p.o.v. the IOMMU UAPI data is opaque, it only needs
> > > > > > to do two things:
> > > > > > 1. is the UAPI compatible?
> > > > > > 2. what is the size to copy?
> > > > > >
> > > > > > If you look at the version number, this is really a "version
> > > > > > as size" lookup, as provided by the helper function in this
> > > > > > patch. An example can be the newly introduced clone3 syscall.
> > > > > > https://lwn.net/Articles/792628/
> > > > > > In clone3, new version must have new size. The slight
> > > > > > difference here is that, unlike clone3, we have multiple data
> > > > > > structures instead of a single struct clone_args {}. And each
> > > > > > struct has flags to enumerate its contents besides size.
> > > > >
> > > > > Thanks for providing that link. However clone3 doesn't include a
> > > > > version field to do "version as size" lookup. Instead, as you
> > > > > said, it includes a size parameter which sounds like the option
> > > > > 3 (argsz) listed below.
> > > > >
> > > > Right, there is no version in clone3. size = version. I view this
> > > > as a 1:1 lookup.
> > > >
> > > > > >
> > > > > > Besides breaching data abstraction, if VFIO has to check IOMMU
> > > > > > flags to determine the sizes, it has many combinations.
> > > > > >
> > > > > > We also separate the responsibilities into two parts
> > > > > > 1. compatibility - version, size by VFIO
> > > > > > 2. sanity check - capability flags - by IOMMU
> > > > >
> > > > > I feel argsz+flags approach can perfectly meet above
> > > > > requirement. The userspace set the size and flags for whatever
> > > > > capabilities it uses, and VFIO simply copies the parameters by
> > > > > size and pass to IOMMU for further sanity check. Of course the
> > > > > assumption is that we do provide an interface for userspace to
> > > > > enumerate all supported capabilities.
> > > > You cannot trust user for argsz. the size to be copied from user
> > > > must be based on knowledge in kernel. That is why we have this
> > > > version to size lookup.
> > > >
> > > > In VFIO, the size to copy is based on knowledge of each VFIO UAPI
> > > > structures and VFIO flags. But here the flags are IOMMU UAPI
> > > > flags. As you pointed out in another thread, VFIO is one user.
> > >
> > > If that is the case, can we let VFIO only copy its own UAPI fields
> > > while simply passing the user pointer of IOMMU UAPI structure to
> > > IOMMU driver for further size check and copy? Otherwise we are
> > > entering a dead end that VFIO doesn't want to parse a structure
> > > which is not defined by him while using version to represent the
> > > black box size is considered as a discarded scheme and doesn't
> > > scale well...
> > I think this could be an other viable option. Let me try to summarize
> > since this has been a long discussion since the original version.
> >
> > Problem statements:
> > 1. When launching vIOMMU in the guest, how can we ensure the host has
> > compatible support upfront? as compared to fail later.

This sounds like a feature/extension interface, both KVM and vfio have
them to allow userspace to check support of specific features.

> > 2. As UAPI data gets extended (both in size and flags), how can we
> > know the size to copy

For vfio we of course use the argsz/flags trick where the user tells us
how big the buffer is and flags in the header tell us what fields
beyond the base specification are enabled. This can get tricky to
extend and there can be confusion whether a flag indicates the presence
of a field or the validity of a field.

We also have interfaces where the ioctl is a header plus a data blob
where flags tell us what the data is. These can serve double duty as a
extension check too as we've done for VFIO_DEVICE_FEATURE. This
doesn't really support extension of a defined feature though, rather
we'd be more likely to create a set of flags that indicate the data
object is feature-v2 and redefine the structure, or of course we
revisit the entire featuring question within the structure of that data
blob.

We also implement capability chains, though they're more meant for
passing data to the user, where the user provides a buffer and we link
capabilities together within that buffer for the user to walk. We've
defined a mechanism through -ENOSPC and argsz to tell the user how
large a buffer is necessary. I dare mention we have a version per
capability as these are largely modeled after capability chains in PCI
config space. We haven't actually incremented any versions, but I
imagine we'd do so like PCI, maintaining backwards compatibility and
only defining unused bits and adding fields as the version increases.

Is the objection to a global version or to any version fields? I don't
really understand the global version, I'd think a mechanism to check
extensions plus a per structure flags/version would be preferred. The
former should resolve how userspace can test support for features
requiring multiple interfaces. A global version also implies that
we're only ever adding features and never removing. For example,
feature Foo is added in version 4, but it's replaced by feature Bar in
version 5, now userspace can't simply test version >= 4 must include
feature Foo.

It seems to me that version and flags can also be complimentary, for
example a field might be defined by a version but a flag could indicate
if it's implemented. With only the flag, we'd infer the field from the
flag, with only the version we'd need to assume the field is always
implemented. So I have a hard time making a blanket statement that all
versions fields should be avoided.

> > 3. Maintain backward compatibility while allowing extensions?
> >
> > I think we all agreed that using flags (capability or types) is the
> > way to address #3. As Christoph pointed out, version number should
> > not be used for this purpose.
> >
> > So for problem 1 & 2, we have the following options:
> > 1. Have a version-size mapping as proposed in this set. VFIO copies
> > from user the correct size based on version-type lookup. Processing
> > of the data is based on flags in IOMMU driver.
> >
> > 2. VFIO copy its own minsz then pass the user pointer to IOMMU driver
> > for further copy_from_user based on flags. (by Kevin)
> >
> > 3. Adopt VFIO argsz scheme, caller fills in argsz for the offset the
> > variable size union. VFIO do not check argsz in that it requires IOMMU
> > specific knowledge. IOMMU driver Use flags to handle the variable
> > size.(by Alex). I think this what we have in Yi's VFIO & QEMU patch.
> > argsz filled by QEMU includes bind_data.
> >
> > 4. Do not use a unified version, have a fixed size of all UAPI
> > structures, padding in struct and union. (Wasteful, not preferred per
> > V1 discussion)
> >
> > For both 2 & 3, a unified version is not used, each API
> > treated separately. vIOMMU will be launched w/o assurance of
> > compatibility of all APIs. Fault handling may be more complex in
> > normal operations.
> >
> > Appreciate everyone's input. Joerg and Alex, could you help to make a
> > decision here?

As above, I think using a global API version number to imply support
for a feature is doomed to fail, we should instead expose an interface
to check for specific features. In any of the proposed solutions, the
IOMMU driver is ultimately responsible for validating the user data, so
do we want vfio performing the copy_from_user() to an object that could
later be assumed to be sanitized, or should vfio just pass a user
pointer to make it obvious that the consumer is responsible for all the
user protections? Seems like the latter. That still really doesn't
address what's in that user data blob yet, but the vfio interface could
be:

struct {
__u32 argsz;
__u32 flags;
__u8 data[];
}

Where flags might be partitioned like we do for DEVICE_FEATURE to
indicate the format of data and what vfio should do with it, and data
might simply be defined as a (__u64 __user *).

This user pointer would then likely be an IOMMU UAPI struct, so I've
only just gotten back the the IOMMU UAPI question at hand, but I don't
really see the disadvantage to including both version and flags fields
per structure. Perhaps this is choice 1. above, but with a version at
a per structure level indicating the backwards compatible size and
layout of the structure and flags being used to indicate support for
optional features within those fields. Is a version field still taboo
for such a use case? Thanks,

Alex