Re: [RFCv2 PATCH 6/7] iommu/amd: Add nested domain allocation support

From: Jason Gunthorpe
Date: Fri Mar 08 2024 - 09:05:05 EST


On Thu, Jan 11, 2024 at 06:06:45PM -0600, Suravee Suthikulpanit wrote:
> @@ -2367,8 +2372,9 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
> domain->nid = NUMA_NO_NODE;
>
> switch (type) {
> - /* No need to allocate io pgtable ops in passthrough mode */
> + /* No need to allocate io pgtable ops in passthrough and nested mode */
> case IOMMU_DOMAIN_IDENTITY:
> + case IOMMU_DOMAIN_NESTED:

These constants should not show up in the driver, it needs to be
reorganized to use the new allocation APIs to avoid this.

> -static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
> +static const struct iommu_domain_ops nested_domain_ops = {
> + .attach_dev = amd_iommu_attach_device,
> + .free = amd_iommu_domain_free,
> +};

I would expect nesting to have its own attach function too, because it
should be quite different.

> static struct iommu_domain *
> amd_iommu_domain_alloc_user(struct device *dev, u32 flags,
> struct iommu_domain *parent,
> const struct iommu_user_data *user_data)
> -
> {
> + struct iommu_domain *dom;
> + struct iommu_dev_data *dev_data;
> unsigned int type = IOMMU_DOMAIN_UNMANAGED;
> + bool nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
> +
> + if (parent) {
> + int ret;
> + struct iommu_hwpt_amd_v2 hwpt;
> +
> + if (parent->ops != amd_iommu_ops.default_domain_ops)
> + return ERR_PTR(-EINVAL);
> +
> + ret = udata_to_iommu_hwpt_amd_v2(user_data, &hwpt);
> + if (ret)
> + return ERR_PTR(ret);
>
> - if ((flags & ~IOMMU_HWPT_ALLOC_DIRTY_TRACKING) || parent || user_data)
> + return amd_iommu_nested_domain_alloc(dev, type, flags,
> + &hwpt, parent);
> + }
> +
> + /* Check supported flags */
> + if ((flags & ~amd_iommu_hwpt_supported_flags) ||
> + !check_nested_support(flags))
> return ERR_PTR(-EOPNOTSUPP);
>
> - return do_iommu_domain_alloc(type, dev, flags);
> + dev_data = dev_iommu_priv_get(dev);
> +
> + /*
> + * When allocated nested parent domain, the device may already
> + * have been attached to a domain. For example, a device is already
> + * attached to the domain allocated by VFIO, which contains GPA->SPA mapping.
> + * In such case, return reference to the same domain.
> + */

What? No! This would break all the lifecycle model. Domain allocation must
always allocate. qemu needs to allocate the nested partent domain type
from the very start.

> +static int nested_gcr3_update(struct iommu_hwpt_amd_v2 *hwpt,
> + struct protection_domain *pdom,
> + struct protection_domain *ppdom,
> + struct device *dev)
> +{
> + struct pci_dev *pdev;
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> +
> + pdev = to_pci_dev(dev);
> + if (!pdev)
> + return -EINVAL;
> +
> + /* Note: Currently only support GCR3TRPMode with nested translation */
> + if (!check_feature2(FEATURE_GCR3TRPMODE))
> + return -EOPNOTSUPP;
> +
> + pdom->parent = ppdom;
> + pdom->guest_domain_id = hwpt->gdom_id;
> + pdom->guest_paging_mode = hwpt->flags.guest_paging_mode;
> +
> + dev_data->gcr3_info.trp_gpa = hwpt->gcr3;
> + dev_data->gcr3_info.glx = hwpt->flags.glx;
> + dev_data->gcr3_info.giov = hwpt->flags.giov;

Here again, this is just data copied from the vDTE to the pDTE - the
guest's gcr3 related parameters are just bits being flowed through

And as before "alloc" shouldn't touch anything outside the
iommu_domain being allocated, touching dev_data here is completely
wrong.

Jason