RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

From: Liu, Yi L
Date: Fri Apr 03 2020 - 07:59:46 EST


Hi Alex,

> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Friday, April 3, 2020 3:20 AM
> To: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
>
> On Sun, 22 Mar 2020 05:32:02 -0700
> "Liu, Yi L" <yi.l.liu@xxxxxxxxx> wrote:
>
> > From: Liu Yi L <yi.l.liu@xxxxxxxxx>
> >
> > VFIO exposes IOMMU nesting translation (a.k.a dual stage translation)
> > capability to userspace. Thus applications like QEMU could support
> > vIOMMU with hardware's nesting translation capability for pass-through
> > devices. Before setting up nesting translation for pass-through devices,
> > QEMU and other applications need to learn the supported 1st-lvl/stage-1
> > translation structure format like page table format.
> >
> > Take vSVA (virtual Shared Virtual Addressing) as an example, to support
> > vSVA for pass-through devices, QEMU setup nesting translation for pass-
> > through devices. The guest page table are configured to host as 1st-lvl/
> > stage-1 page table. Therefore, guest format should be compatible with
> > host side.
> >
> > This patch reports the supported 1st-lvl/stage-1 page table format on the
> > current platform to userspace. QEMU and other alike applications should
> > use this format info when trying to setup IOMMU nesting translation on
> > host IOMMU.
> >
> > Cc: Kevin Tian <kevin.tian@xxxxxxxxx>
> > CC: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > Cc: Eric Auger <eric.auger@xxxxxxxxxx>
> > Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> > Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx>
> > ---
> > drivers/vfio/vfio_iommu_type1.c | 56
> +++++++++++++++++++++++++++++++++++++++++
> > include/uapi/linux/vfio.h | 1 +
> > 2 files changed, 57 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index 9aa2a67..82a9e0b 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct
> vfio_iommu *iommu,
> > return ret;
> > }
> >
> > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
> > + u32 *stage1_format)
> > +{
> > + struct vfio_domain *domain;
> > + u32 format = 0, tmp_format = 0;
> > + int ret;
> > +
> > + mutex_lock(&iommu->lock);
> > + if (list_empty(&iommu->domain_list)) {
> > + mutex_unlock(&iommu->lock);
> > + return -EINVAL;
> > + }
> > +
> > + list_for_each_entry(domain, &iommu->domain_list, next) {
> > + if (iommu_domain_get_attr(domain->domain,
> > + DOMAIN_ATTR_PASID_FORMAT, &format)) {
> > + ret = -EINVAL;
> > + format = 0;
> > + goto out_unlock;
> > + }
> > + /*
> > + * format is always non-zero (the first format is
> > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For
> > + * the reason of potential different backed IOMMU
> > + * formats, here we expect to have identical formats
> > + * in the domain list, no mixed formats support.
> > + * return -EINVAL to fail the attempt of setup
> > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats
> > + * are detected.
> > + */
> > + if (tmp_format && tmp_format != format) {
> > + ret = -EINVAL;
> > + format = 0;
> > + goto out_unlock;
> > + }
> > +
> > + tmp_format = format;
> > + }
> > + ret = 0;
> > +
> > +out_unlock:
> > + if (format)
> > + *stage1_format = format;
> > + mutex_unlock(&iommu->lock);
> > + return ret;
> > +}
> > +
> > static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> > struct vfio_info_cap *caps)
> > {
> > struct vfio_info_cap_header *header;
> > struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> > + u32 formats = 0;
> > + int ret;
> > +
> > + ret = vfio_iommu_get_stage1_format(iommu, &formats);
> > + if (ret) {
> > + pr_warn("Failed to get stage-1 format\n");
> > + return ret;
>
> Looks like this generates a warning and causes the iommu_get_info ioctl
> to fail if the hardware doesn't support the pasid format attribute, or
> the domain list is empty. This breaks users on existing hardware.

oops, yes, it should not fail anything as it is just an extended feature.
let me correct it.

Thanks,
Yi Liu