Re: [RFC PATCH 6/6] iommu/amd: Introduce nested translation support

From: Jason Gunthorpe
Date: Wed Dec 13 2023 - 08:55:36 EST


On Tue, Dec 12, 2023 at 10:01:39AM -0600, Suravee Suthikulpanit wrote:

> - if ((flags & ~IOMMU_HWPT_ALLOC_DIRTY_TRACKING) || parent || user_data)
> + ret = udata_to_iommu_hwpt_amd_v2(user_data, &hwpt);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + return amd_iommu_nested_domain_alloc(dev, &hwpt);
> + }
> +
> + /* Check supported flags */
> + if (flags & (~(IOMMU_HWPT_ALLOC_NEST_PARENT |
> + IOMMU_HWPT_ALLOC_DIRTY_TRACKING)))
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + if (!check_nested_support(flags))
> return ERR_PTR(-EOPNOTSUPP);
>
> - return do_iommu_domain_alloc(type, dev, flags);
> + dom = iommu_domain_alloc(dev->bus);

Please don't call iommu_domain_alloc, call your internal function and
force it to allocate the v1 domain..

> +static int nested_gcr3_update(struct iommu_hwpt_amd_v2 *hwpt, struct iommu_domain *udom)
> +{
> + int ret;
> + u16 hdev_id;
> + struct pci_dev *pdev;
> + struct amd_iommu *iommu;
> +
> + iommu = get_amd_iommu_from_devid(hwpt->iommu_id);
> + hdev_id = get_hdev_id(iommu, hwpt->gid, hwpt->gdev_id);
> +
> + pr_debug("%s: gid=%u, hdev_id=%#x, gcr3=%#llx\n",
> + __func__, hwpt->gid, hdev_id,
> + (unsigned long long) hwpt->gcr3);
> +
> + pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(hdev_id),
> + hdev_id & 0xff);

Huh? "hdev_id"? This is not OK..

The device you are allowed to look at is the "struct device *dev" passed
to alloc. You cannot pass in a struct device and then override it with
another value.

> + if (!pdev)
> + return -EINVAL;
> +
> + /* Note: Currently only support GCR3TRPMode with nested translation */
> + if (!check_feature2(FEATURE_GCR3TRPMODE))
> + return -EOPNOTSUPP;
> +
> + ret = amd_iommu_set_gcr3tbl_trp(iommu, pdev, hwpt->gcr3, hwpt->glx,
> + hwpt->guest_paging_mode);

Waah?

This is touching the dev table? That is not right, allocation is only
*ALLOCATION*. The dev table can't be changed until you do attachment.

Please look at the smmuv3 patches and try to be structurally
similar. AMD and SMMUv3 are *very similar* in how their HW works
excluding the viommu stuff.

You also can't assume your parent is currently attached to anything.

The construction of the DTE has to be from-scratch based on the parent
domain and the provided values in the "hwpt". Again see how smmuv3
does this where there is one function that builds the entire DTE
(called STE)

I'm skeptical you can do this properly without also restructuring the
DTE logic like I've mentioned before, there is a reason I did that for
SMMUv3. :)

> +struct iommu_domain *amd_iommu_nested_domain_alloc(struct device *dev,
> + struct iommu_hwpt_amd_v2 *hwpt)
> +{
> + int ret;
> + struct iommu_domain *dom;
> + struct protection_domain *pdom;
> +
> + dom = iommu_domain_alloc(dev->bus);
> + if (!dom)
> + return ERR_PTR(-ENOMEM);

Also no, do not allocate a normal domain and then 'wreck'
it into a nesting domain. Refactor the allocation code to be in
smaller chucks so you can alloc and init the memory directly here.

Jason