RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

From: Liu, Yi L
Date: Thu Apr 09 2020 - 08:47:24 EST


Hi Jean,

> From: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> Sent: Thursday, April 9, 2020 4:15 PM
> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
>
> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote:
> > Hi Yi,
> >
> > On 4/7/20 11:43 AM, Liu, Yi L wrote:
> > > Hi Jean,
> > >
> > >> From: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> > >> Sent: Friday, April 3, 2020 4:23 PM
> > >> To: Auger Eric <eric.auger@xxxxxxxxxx>
> > >> userspace
> > >>
> > >> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
> > >>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> > >>>>>>
> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> > >> @@ -2254,6 +2309,7
> > >>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct
> > >>>>> vfio_iommu *iommu,
> > >>>>>> /* nesting iommu type supports PASID requests (alloc/free)
> */
> > >>>>>> nesting_cap->nesting_capabilities |=
> VFIO_IOMMU_PASID_REQS;
> > >>>>> What is the meaning for ARM?
> > >>>>
> > >>>> I think it's just a software capability exposed to userspace, on
> > >>>> userspace side, it has a choice to use it or not. :-) The reason
> > >>>> define it and report it in cap nesting is that I'd like to make the
> > >>>> pasid alloc/free be available just for IOMMU with type
> > >>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good
> > >>>> for ARM. We can find a proper way to report the availability.
> > >>>
> > >>> Well it is more a question for jean-Philippe. Do we have a system wide
> > >>> PASID allocation on ARM?
> > >>
> > >> We don't, the PASID spaces are per-VM on Arm, so this function should consult
> the
> > >> IOMMU driver before setting flags. As you said on patch 3, nested doesn't
> > >> necessarily imply PASID support. The SMMUv2 does not support PASID but does
> > >> support nesting stages 1 and 2 for the IOVA space.
> > >> SMMUv3 support of PASID depends on HW capabilities. So I think this needs to
> be
> > >> finer grained:
> > >>
> > >> Does the container support:
> > >> * VFIO_IOMMU_PASID_REQUEST?
> > >> -> Yes for VT-d 3
> > >> -> No for Arm SMMU
> > >> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
> > >> -> Yes for VT-d 3
> > >> -> Sometimes for SMMUv2
> > >> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to
> > >> PASID tables being in GPA space.)
> > >> * VFIO_IOMMU_BIND_PASID_TABLE?
> > >> -> No for VT-d
> > >> -> Sometimes for SMMUv3
> > >>
> > >> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
> > >
> > > good summary. do you expect to see any
> > >
> > >>
> > >>>>>> + nesting_cap->stage1_formats = formats;
> > >>>>> as spotted by Kevin, since a single format is supported, rename
> > >>>>
> > >>>> ok, I was believing it may be possible on ARM or so. :-) will rename
> > >>>> it.
> > >>
> > >> Yes I don't think an u32 is going to cut it for Arm :( We need to
> > >> describe all sorts
> of
> > >> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW
> > >> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I
> > >> guess we could have a secondary vendor capability for these?
> > >
> > > Actually, I'm wondering if we can define some formats to stands for a set of
> > > capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level
> > > page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse
> > > the capabilities.
> >
> > But eventually do we really need all those capability getters? I mean
> > can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL()
> > to detect any mismatch? Definitively the error handling may be heavier
> > on userspace but can't we manage.
>
> I think we need to present these capabilities at boot time, long before
> the guest triggers a bind(). For example if the host SMMU doesn't support
> 16-bit ASID, we need to communicate that to the guest using vSMMU ID
> registers or PROBE properties. Otherwise a bind() will succeed, but if the
> guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events
> which we'll inject into the guest, for no apparent reason from their
> perspective.
>
> In addition some VMMs may have fallbacks if shared page tables are not
> available. They could fall back to a MAP/UNMAP interface, or simply not
> present a vIOMMU to the guest.
>

Based on the comments, I think it would be a need to report iommu caps
in detail. So I guess iommu uapi needs to provide something alike vfio
cap chain in iommu uapi. Please feel free let me know your thoughts. :-)

In vfio, we can define a cap as below:

struct vfio_iommu_type1_info_cap_nesting {
struct vfio_info_cap_header header;
__u64 iommu_model;
#define VFIO_IOMMU_PASID_REQS (1 << 0)
#define VFIO_IOMMU_BIND_GPASID (1 << 1)
#define VFIO_IOMMU_CACHE_INV (1 << 2)
__u32 nesting_capabilities;
__u32 pasid_bits;
#define VFIO_IOMMU_VENDOR_SUB_CAP (1 << 3)
__u32 flags;
__u32 data_size;
__u8 data[]; /*iommu info caps defined by iommu uapi */
};

VFIO needs new iommu APIs to ask iommu driver whether PASID/bind_gpasid/
cache_inv/bind_gpasid_table is available or not and also the pasid
bits. After that VFIO will ask iommu driver about the iommu_cap_info
and fill in the @data[] field.

iommu uapi:
struct iommu_info_cap_header {
__u16 id; /* Identifies capability */
__u16 version; /* Version specific to the capability ID */
__u32 next; /* Offset of next capability */
};

#define IOMMU_INFO_CAP_INTEL_VTD 1
struct iommu_info_cap_intel_vtd {
struct iommu_info_cap_header header;
__u32 vaddr_width; /* VA addr_width*/
__u32 ipaddr_width; /* IPA addr_width, input of SL page table */
/* same definition with @flags instruct iommu_gpasid_bind_data_vtd */
__u64 flags;
};

#define IOMMU_INFO_CAP_ARM_SMMUv3 2
struct iommu_info_cap_arm_smmuv3 {
struct iommu_info_cap_header header;
...
};

Regards,
Yi Liu