RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

From: Liu, Yi L
Date: Thu Apr 09 2020 - 23:58:01 EST


Hi Eric,

> From: Auger Eric <eric.auger@xxxxxxxxxx>
> Sent: Friday, April 10, 2020 11:28 AM
> To: Liu, Yi L <yi.l.liu@xxxxxxxxx>; Jean-Philippe Brucker <jean-
> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
>
> Hi Yi,
>
> On 4/9/20 2:47 PM, Liu, Yi L wrote:
> > Hi Jean,
> >
> >> From: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> >> Sent: Thursday, April 9, 2020 4:15 PM
> >> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1
> >> format to userspace
> >>
> >> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote:
> >>> Hi Yi,
> >>>
> >>> On 4/7/20 11:43 AM, Liu, Yi L wrote:
> >>>> Hi Jean,
> >>>>
> >>>>> From: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> >>>>> Sent: Friday, April 3, 2020 4:23 PM
> >>>>> To: Auger Eric <eric.auger@xxxxxxxxxx> userspace
> >>>>>
> >>>>> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
> >>>>>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> >>>>>>>>>
> >> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> >>>>> @@ -2254,6 +2309,7
> >>>>>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct
> >>>>>>>> vfio_iommu *iommu,
> >>>>>>>>> /* nesting iommu type supports PASID requests
> (alloc/free)
> >> */
> >>>>>>>>> nesting_cap->nesting_capabilities |=
> >> VFIO_IOMMU_PASID_REQS;
> >>>>>>>> What is the meaning for ARM?
> >>>>>>>
> >>>>>>> I think it's just a software capability exposed to userspace, on
> >>>>>>> userspace side, it has a choice to use it or not. :-) The reason
> >>>>>>> define it and report it in cap nesting is that I'd like to make
> >>>>>>> the pasid alloc/free be available just for IOMMU with type
> >>>>>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not
> >>>>>>> good for ARM. We can find a proper way to report the availability.
> >>>>>>
> >>>>>> Well it is more a question for jean-Philippe. Do we have a system
> >>>>>> wide PASID allocation on ARM?
> >>>>>
> >>>>> We don't, the PASID spaces are per-VM on Arm, so this function
> >>>>> should consult
> >> the
> >>>>> IOMMU driver before setting flags. As you said on patch 3, nested
> >>>>> doesn't necessarily imply PASID support. The SMMUv2 does not
> >>>>> support PASID but does support nesting stages 1 and 2 for the IOVA space.
> >>>>> SMMUv3 support of PASID depends on HW capabilities. So I think
> >>>>> this needs to
> >> be
> >>>>> finer grained:
> >>>>>
> >>>>> Does the container support:
> >>>>> * VFIO_IOMMU_PASID_REQUEST?
> >>>>> -> Yes for VT-d 3
> >>>>> -> No for Arm SMMU
> >>>>> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
> >>>>> -> Yes for VT-d 3
> >>>>> -> Sometimes for SMMUv2
> >>>>> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler
> due to
> >>>>> PASID tables being in GPA space.)
> >>>>> * VFIO_IOMMU_BIND_PASID_TABLE?
> >>>>> -> No for VT-d
> >>>>> -> Sometimes for SMMUv3
> >>>>>
> >>>>> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
> >>>>
> >>>> good summary. do you expect to see any
> >>>>
> >>>>>
> >>>>>>>>> + nesting_cap->stage1_formats = formats;
> >>>>>>>> as spotted by Kevin, since a single format is supported, rename
> >>>>>>>
> >>>>>>> ok, I was believing it may be possible on ARM or so. :-) will
> >>>>>>> rename it.
> >>>>>
> >>>>> Yes I don't think an u32 is going to cut it for Arm :( We need to
> >>>>> describe all sorts
> >> of
> >>>>> capabilities for page and PASID tables (granules, GPA size,
> >>>>> ASID/PASID size, HW access/dirty, etc etc.) Just saying "Arm
> >>>>> stage-1 format" wouldn't mean much. I guess we could have a secondary
> vendor capability for these?
> >>>>
> >>>> Actually, I'm wondering if we can define some formats to stands for
> >>>> a set of capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may
> >>>> indicates the 1st level page table related caps (aw, a/d, SRE, EA
> >>>> and etc.). And vIOMMU can parse the capabilities.
> >>>
> >>> But eventually do we really need all those capability getters? I
> >>> mean can't we simply rely on the actual call to
> >>> VFIO_IOMMU_BIND_GUEST_PGTBL() to detect any mismatch? Definitively
> >>> the error handling may be heavier on userspace but can't we manage.
> >>
> >> I think we need to present these capabilities at boot time, long
> >> before the guest triggers a bind(). For example if the host SMMU
> >> doesn't support 16-bit ASID, we need to communicate that to the guest
> >> using vSMMU ID registers or PROBE properties. Otherwise a bind() will
> >> succeed, but if the guest uses 16-bit ASIDs in its CD, DMA will
> >> result in C_BAD_CD events which we'll inject into the guest, for no
> >> apparent reason from their perspective.
> >>
> >> In addition some VMMs may have fallbacks if shared page tables are
> >> not available. They could fall back to a MAP/UNMAP interface, or
> >> simply not present a vIOMMU to the guest.
> >>
> >
> > Based on the comments, I think it would be a need to report iommu caps
> > in detail. So I guess iommu uapi needs to provide something alike vfio
> > cap chain in iommu uapi. Please feel free let me know your thoughts.
> > :-)
>
> Yes to me it sounds sensible.
> >
> > In vfio, we can define a cap as below:
> >
> > struct vfio_iommu_type1_info_cap_nesting {
> > struct vfio_info_cap_header header;
> > __u64 iommu_model;
> > #define VFIO_IOMMU_PASID_REQS (1 << 0)
> I still think the name shall be changed

yes, I'll rename it per your suggestion.:-)

> > #define VFIO_IOMMU_BIND_GPASID (1 << 1)
> > #define VFIO_IOMMU_CACHE_INV (1 << 2)
> this operation seems mandated as soon as we have a nested paging based
> implementation?

oh, yes, should be. will remove it and comment in the code.

Regards,
Yi Liu

> > __u32 nesting_capabilities;
> > __u32 pasid_bits;
> > #define VFIO_IOMMU_VENDOR_SUB_CAP (1 << 3)
> > __u32 flags;
> > __u32 data_size;
> > __u8 data[]; /*iommu info caps defined by iommu uapi */
> > };
> >
> > VFIO needs new iommu APIs to ask iommu driver whether
> > PASID/bind_gpasid/ cache_inv/bind_gpasid_table is available or not and
> > also the pasid bits. After that VFIO will ask iommu driver about the
> > iommu_cap_info and fill in the @data[] field.
> >
> > iommu uapi:
> > struct iommu_info_cap_header {
> > __u16 id; /* Identifies capability */
> > __u16 version; /* Version specific to the capability ID */
> > __u32 next; /* Offset of next capability */
> > };
> >
> > #define IOMMU_INFO_CAP_INTEL_VTD 1
> > struct iommu_info_cap_intel_vtd {
> > struct iommu_info_cap_header header;
> > __u32 vaddr_width; /* VA addr_width*/
> > __u32 ipaddr_width; /* IPA addr_width, input of SL page table */
> > /* same definition with @flags instruct iommu_gpasid_bind_data_vtd */
> > __u64 flags;
> > };
> >
> > #define IOMMU_INFO_CAP_ARM_SMMUv3 2
> > struct iommu_info_cap_arm_smmuv3 {
> > struct iommu_info_cap_header header;
> > ...
> > };
>
> Thanks
>
> Eric
> >
> > Regards,
> > Yi Liu
> >