Re: [PATCH v3 07/17] iommufd: Add IOMMU_RESV_IOVA_RANGES

From: Jason Gunthorpe
Date: Fri Jul 28 2023 - 13:45:04 EST


On Mon, Jul 24, 2023 at 04:03:56AM -0700, Yi Liu wrote:

> +/**
> + * struct iommu_resv_iova_ranges - ioctl(IOMMU_RESV_IOVA_RANGES)
> + * @size: sizeof(struct iommu_resv_iova_ranges)
> + * @dev_id: device to read resv iova ranges for
> + * @num_iovas: Input/Output total number of resv ranges for the device
> + * @__reserved: Must be 0
> + * @resv_iovas: Pointer to the output array of struct iommu_resv_iova_range
> + *
> + * Query a device for ranges of reserved IOVAs. num_iovas will be set to the
> + * total number of iovas and the resv_iovas[] will be filled in as space
> + * permits.
> + *
> + * On input num_iovas is the length of the resv_iovas array. On output it is
> + * the total number of iovas filled in. The ioctl will return -EMSGSIZE and
> + * set num_iovas to the required value if num_iovas is too small. In this
> + * case the caller should allocate a larger output array and re-issue the
> + * ioctl.
> + *
> + * Under nested translation, userspace should query the reserved IOVAs for a
> + * given device, and report it to the stage-1 I/O page table owner to exclude
> + * the reserved IOVAs. The reserved IOVAs can also be used to figure out the
> + * allowed IOVA ranges for the IOAS that the device is attached to. For detail
> + * see ioctl IOMMU_IOAS_IOVA_RANGES.

I'm not sure I like this, the other APIs here work with the *allowed*
IOVAs, which is the inverse of this one which works with the
*disallowed* IOVAs.

It means we can't take the output of this API and feed it into
IOMMUFD_CMD_IOAS_ALLOW_IOVAS.. Though I suppose qemu isn't going to do
this anyhow.

On the other hand, it is kind of hard to intersect an allowed list of
multiple idevs into a single master list.

As it is, userspace will have to aggregate the list, sort it, merge
adjacent overlapping reserved ranges then invert the list to get an
allowed list. This is not entirely simple..

Did you already write an algorithm to do this in qemu someplace?

Anyhow, this should be split out from this series. It seems simple
enough to merge it now if someone can confirm what qemu needs.

Jason