Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups

From: Jason Gunthorpe
Date: Wed Dec 22 2021 - 19:57:18 EST


On Wed, Dec 22, 2021 at 08:26:34PM +0000, Robin Murphy wrote:
> On 21/12/2021 6:46 pm, Jason Gunthorpe wrote:
> > On Tue, Dec 21, 2021 at 04:50:56PM +0000, Robin Murphy wrote:
> >
> > > this proposal is the worst of both worlds, in that drivers still have to be
> > > just as aware of groups in order to know whether to call the _shared
> > > interface or not, except it's now entirely implicit and non-obvious.
> >
> > Drivers are not aware of groups, where did you see that?
>
> `git grep iommu_attach_group -- :^drivers/iommu :^include`
>
> Did I really have to explain that?

Well, yes you did, because it shows you haven't understood my
question. After this series we deleted all those calls (though Lu, we
missed one of the tegra ones in staging, let's get it for the next
posting)

So, after this series, where do you see drivers being aware of groups?
If things are missed lets expect to fix them.

> > If the driver uses multiple struct devices and intends to connect them
> > all to the same domain then it uses the _shared variant. The only
> > difference between the two is the _shared varient lacks some of the
> > protections against driver abuse of the API.
>
> You've lost me again; how are those intentions any different? Attaching one
> device to a private domain is a literal subset of attaching more than one
> device to a private domain.

Yes it is a subset, but drivers will malfunction if they are not
designed to have multi-attachment and wrongly get it, and there is
only one driver that does actually need this.

I maintain a big driver subsystem and have learned that grepability of
the driver mess for special cases is quite a good thing to
have. Forcing drivers to mark in code when they do something weird is
an advantage, even if it causes some small API redundancy.

However, if you really feel strongly this should really be one API
with the _shared implementation I won't argue it any further.

> So then we have the iommu_attach_group() interface for new code (and still
> nobody has got round to updating the old code to it yet), for which
> the

This series is going in the direction of eliminating
iommu_attach_group() as part of the driver
interface. iommu_attach_group() is repurposed to only be useful for
VFIO.

> properly, or iommu_attach_group() with a potentially better interface and
> actual safety. The former is still more prevalent (and the interface
> argument compelling), so if we put the new implementation behind that, with
> the one tweak of having it set DMA_OWNER_PRIVATE_DOMAIN automatically, kill
> off iommu_attach_group() by converting its couple of users,

This is what we did, iommu_attach_device() & _shared() are to be the
only interface for the drivers, and we killed off the
iommu_attach_group() couple of users except VFIO (the miss of
drivers/staging excepted)

> and not only have we solved the VFIO problem but we've also finally
> updated all the legacy code for free! Of course you can have a
> separate version for VFIO to attach with
> DMA_OWNER_PRIVATE_DOMAIN_USER if you like, although I still fail to
> understand the necessity of the distinction.

And the seperate version for VFIO is called 'iommu_attach_group()'.

Lu, it is probably a good idea to add an assertion here that the group
is in DMA_OWNER_PRIVATE_DOMAIN_USER to make it clear that
iommu_attach_group() is only for VFIO.

VFIO has a special requirement that it be able to do:

+ ret = iommu_group_set_dma_owner(group->iommu_group,
+ DMA_OWNER_PRIVATE_DOMAIN_USER, f.file);

Without having a iommu_domain to attach.

This is because of the giant special case that PPC made of VFIO's
IOMMU code. PPC (aka vfio_iommu_spapr_tce.c) requires the group
isolation that iommu_group_set_dma_owner() provides, but does not
actually have an iommu_domain and can not/does not call
iommu_attach_group().

Fixing this is a whole other giant adventure I'm hoping David will
help me unwind next year..

This series solves this problem by using the two step sequence of
iommu_group_set_dma_owner()/iommu_attach_group() and conceptually
redefining how iommu_attach_group() works to require the external
caller to have done the iommu_group_set_dma_owner() for it. This is
why the series has three APIs, because the VFIO special one assumes
external iommu_group_set_dma_owner(). It just happens that is exactly
the same code as iommu_attach_group() today.

As for why does DMA_OWNER_PRIVATE_DOMAIN_USER exist? VFIO doesn't have
an iommu_domain at this point but it still needs the iommu core to
detatch the default domain. This is what the _USER does.

Soo..

There is another way to organize this and perhaps it does make more
sense. I will try to sketch briefly in email, try to imagine the
gaps..

API family (== compares to this series):

iommu_device_use_dma_api(dev);
== iommu_device_set_dma_owner(dev, DMA_OWNER_DMA_API, NULL);

iommu_group_set_dma_owner(group, file);
== iommu_device_set_dma_owner(dev, DMA_OWNER_PRIVATE_DOMAIN_USER,
file);
Always detaches all domains from the group

iommu_attach_device(domain, dev)
== as is in this patch
dev and domain are 1:1

iommu_attach_device_shared(domain, dev)
== as is in this patch
dev and domain are N:1
* could just be the same as iommu_attach_device

iommu_replace_group_domain(group, old_domain, new_domain)
Makes group point at new_domain. new_domain can be NULL.

iommu_device_unuse_dma_api(dev)
== iommu_device_release_dma_owner() in this patch

iommu_group_release_dma_owner(group)
== iommu_detatch_group() && iommu_group_release_dma_owner()

VFIO would use the sequence:

iommu_group_set_dma_owner(group, file);
iommu_replace_group_domain(group, NULL, domain_1);
iommu_replace_group_domain(group, domain_1, domain_2);
iommu_group_release_dma_owner(group);

Simple devices would use

iommu_attach_device(domain, dev);
iommu_detatch_device(domain, dev);

Tegra would use:

iommu_attach_device_shared(domain, dev);
iommu_detatch_device_shared(domain, dev);
// Or not, if people agree we should not mark this

DMA API would have the driver core dma_configure do:
iommu_device_use_dma_api(dev);
dev->driver->probe()
iommu_device_unuse_dma_api(dev);

It is more APIs overall, but perhaps they have a much clearer
purpose.

I think it would be clear why iommu_group_set_dma_owner(), which
actually does detatch, is not the same thing as iommu_attach_device().

I'm not sure if this entirely eliminates
DMA_OWNER_PRIVATE_DOMAIN_USER, or not, but at least it isn't in the
API.

Is it better?

> What VFIO wants is (conceptually[1]) "attach this device to my domain,
> provided it and any other devices in its group are managed by a driver I
> approve of."

Yes, sure, "conceptually". But, there are troublesome details.

> VFIO will also need a struct device anyway, because once I get back from my
> holiday in the new year I need to start working with Simon on evolving the
> rest of the API away from bus->iommu_ops to dev->iommu so we can finally
> support IOMMU drivers coexisting[2].

For VFIO it would be much easier to get the ops from the struct
iommu_group (eg via iommu_group->default_domain->ops, or whatever).

> Indeed I agree with that second point, I'm just increasingly baffled how
> it's not clear to you that there is only one fundamental use-case here.
> Perhaps I'm too familiar with the history to objectively see how unclear the
> current state of things might be :/

I think it is because you are just not familiar with the dark corners
of VFIO.

VFIO has a special case, I outlined above.

> > This is taking 426a to it's logical conclusion and *removing* the
> > group API from the drivers entirely. This is desirable because drivers
> > cannot do anything sane with the group.
>
> I am in complete agreement with that (to the point of also not liking patch
> #6).

Unfortunately patch #6 is only because of VFIO needing to use the
group as a handle.

Jason