RE: [PATCH v3 07/17] iommufd: Add IOMMU_RESV_IOVA_RANGES

From: Tian, Kevin
Date: Mon Jul 31 2023 - 02:21:56 EST


> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Saturday, July 29, 2023 1:45 AM
>
> On Mon, Jul 24, 2023 at 04:03:56AM -0700, Yi Liu wrote:
>
> > +/**
> > + * struct iommu_resv_iova_ranges - ioctl(IOMMU_RESV_IOVA_RANGES)
> > + * @size: sizeof(struct iommu_resv_iova_ranges)
> > + * @dev_id: device to read resv iova ranges for
> > + * @num_iovas: Input/Output total number of resv ranges for the device
> > + * @__reserved: Must be 0
> > + * @resv_iovas: Pointer to the output array of struct
> iommu_resv_iova_range
> > + *
> > + * Query a device for ranges of reserved IOVAs. num_iovas will be set to
> the
> > + * total number of iovas and the resv_iovas[] will be filled in as space
> > + * permits.
> > + *
> > + * On input num_iovas is the length of the resv_iovas array. On output it is
> > + * the total number of iovas filled in. The ioctl will return -EMSGSIZE and
> > + * set num_iovas to the required value if num_iovas is too small. In this
> > + * case the caller should allocate a larger output array and re-issue the
> > + * ioctl.
> > + *
> > + * Under nested translation, userspace should query the reserved IOVAs
> for a
> > + * given device, and report it to the stage-1 I/O page table owner to
> exclude
> > + * the reserved IOVAs. The reserved IOVAs can also be used to figure out
> the
> > + * allowed IOVA ranges for the IOAS that the device is attached to. For
> detail
> > + * see ioctl IOMMU_IOAS_IOVA_RANGES.
>
> I'm not sure I like this, the other APIs here work with the *allowed*
> IOVAs, which is the inverse of this one which works with the
> *disallowed* IOVAs.
>
> It means we can't take the output of this API and feed it into
> IOMMUFD_CMD_IOAS_ALLOW_IOVAS.. Though I suppose qemu isn't going
> to do
> this anyhow.
>
> On the other hand, it is kind of hard to intersect an allowed list of
> multiple idevs into a single master list.
>
> As it is, userspace will have to aggregate the list, sort it, merge
> adjacent overlapping reserved ranges then invert the list to get an
> allowed list. This is not entirely simple..
>
> Did you already write an algorithm to do this in qemu someplace?

Qemu is optional to aggregate it for S2 given IOMMU_IOAS_IOVA_RANGES
is still being used. If the only purpose of using this new cmd is to report
per-device reserved ranges to the guest then aggregation is not required.

Arguably IOMMU_IOAS_IOVA_RANGES becomes redundant with this
new cmd. But it's already there and as you said it's actually more
convenient to be used if the user doesn't care about per-device
reserved ranges...

>
> Anyhow, this should be split out from this series. It seems simple
> enough to merge it now if someone can confirm what qemu needs.
>
> Jason