Re: [PATCH v1 1/8] iommu: Add iommu_group_replace_domain()

From: Robin Murphy
Date: Mon Feb 14 2022 - 11:41:44 EST


On 2022-02-14 14:56, Jason Gunthorpe via iommu wrote:
On Mon, Feb 14, 2022 at 02:10:19PM +0000, Robin Murphy wrote:
On 2022-02-14 12:45, Jason Gunthorpe wrote:
On Mon, Feb 14, 2022 at 12:09:36PM +0000, Robin Murphy wrote:
On 2022-01-06 02:20, Lu Baolu wrote:
Expose an interface to replace the domain of an iommu group for frameworks
like vfio which claims the ownership of the whole iommu group.

But if the underlying point is the new expectation that
iommu_{attach,detach}_device() operate on the device's whole group where
relevant, why should we invent some special mechanism for VFIO to be
needlessly inconsistent?

I said before that it's trivial for VFIO to resolve a suitable device if it
needs to; by now I've actually written the patch ;)

https://gitlab.arm.com/linux-arm/linux-rm/-/commit/9f37d8c17c9b606abc96e1f1001c0b97c8b93ed5

Er, how does locking work there? What keeps busdev from being
concurrently unplugged?

Same thing that prevents the bus pointer from suddenly becoming invalid in
the current code, I guess :)

Oooh, yes, that does look broken now too. :(

How can iommu_group_get() be safely called on
this pointer?

What matters is being able to call *other* device-based IOMMU API
interfaces in the long term.

Yes, this is what I mean, those are the ones that call
iommu_group_get().

All of the above only works normally inside a probe/remove context
where the driver core is blocking concurrent unplug and descruction.

I think I said this last time you brought it up that lifetime was the
challenge with this idea.

Indeed, but it's a challenge that needs tackling, because the bus-based
interfaces need to go away. So either we figure it out now and let this
attach interface rework benefit immediately, or I spend three times as long

IMHO your path is easier if you let VFIO stay with the group interface
and use something like:

domain = iommu_group_alloc_domain(group)

Which is what VFIO is trying to accomplish. Since Lu removed the only
other user of iommu_group_for_each_dev() it means we can de-export
that interface.

This works better because the iommu code can hold the internal group
while it finds the bus/device and then invokes the driver op. We don't
have a lifetime problem anymore under that lock.

That's certainly one of the cleaner possibilities - per the theme of this thread I'm not hugely keen on proliferating special VFIO-specific versions of IOMMU APIs, but trying to take the dev->mutex might be a bit heavy-handed and risky, and getting at the vfio_group->device_lock a bit fiddly, so if I can't come up with anything nicer or more general it might be a fair compromise.

The remaining VFIO use of bus for iommu_capable() is better done
against the domain or the group object, as appropriate.

Indeed, although half the implementations of .capable are nonsense already, so I'm treating that one as a secondary priority for the moment (with an aim to come back afterwards and just try to kill it off as far as possible). RDMA and VFIO shouldn't be a serious concern for the kind of systems with heterogeneous IOMMUs at this point.

In the bigger picture, VFIO should stop doing
'iommu_group_alloc_domain' by moving the domain alloc to
VFIO_GROUP_GET_DEVICE_FD where we have a struct device to use.

We've already been experimenting with this for iommufd and the subtle
difference in the uapi doesn't seem relevant.

solving it on my own and end up deleting
iommu_group_replace_domain() in about 6 months' time anyway.

I expect this API to remain until we figure out a solution to the PPC
problem, and come up with an alternative way to change the attached
domain on the fly.

I though PPC wasn't using the IOMMU API at all... or is that the problem?

Thanks,
Robin.