Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups

From: Lu Baolu
Date: Thu Dec 23 2021 - 00:59:14 EST


Hi Robin and Jason,

On 12/23/21 8:57 AM, Jason Gunthorpe wrote:
On Wed, Dec 22, 2021 at 08:26:34PM +0000, Robin Murphy wrote:
On 21/12/2021 6:46 pm, Jason Gunthorpe wrote:
On Tue, Dec 21, 2021 at 04:50:56PM +0000, Robin Murphy wrote:

this proposal is the worst of both worlds, in that drivers still have to be
just as aware of groups in order to know whether to call the _shared
interface or not, except it's now entirely implicit and non-obvious.

Drivers are not aware of groups, where did you see that?

`git grep iommu_attach_group -- :^drivers/iommu :^include`

Did I really have to explain that?

Well, yes you did, because it shows you haven't understood my
question. After this series we deleted all those calls (though Lu, we
missed one of the tegra ones in staging, let's get it for the next
posting)

Yes, I will.


So, after this series, where do you see drivers being aware of groups?
If things are missed lets expect to fix them.

If the driver uses multiple struct devices and intends to connect them
all to the same domain then it uses the _shared variant. The only
difference between the two is the _shared varient lacks some of the
protections against driver abuse of the API.

You've lost me again; how are those intentions any different? Attaching one
device to a private domain is a literal subset of attaching more than one
device to a private domain.

Yes it is a subset, but drivers will malfunction if they are not
designed to have multi-attachment and wrongly get it, and there is
only one driver that does actually need this.

I maintain a big driver subsystem and have learned that grepability of
the driver mess for special cases is quite a good thing to
have. Forcing drivers to mark in code when they do something weird is
an advantage, even if it causes some small API redundancy.

However, if you really feel strongly this should really be one API
with the _shared implementation I won't argue it any further.

So then we have the iommu_attach_group() interface for new code (and still
nobody has got round to updating the old code to it yet), for which
the

This series is going in the direction of eliminating
iommu_attach_group() as part of the driver
interface. iommu_attach_group() is repurposed to only be useful for
VFIO.

We can also remove iommu_attach_group() in VFIO because it is
essentially equivalent to

iommu_group_for_each_dev(group, iommu_attach_device(dev))


properly, or iommu_attach_group() with a potentially better interface and
actual safety. The former is still more prevalent (and the interface
argument compelling), so if we put the new implementation behind that, with
the one tweak of having it set DMA_OWNER_PRIVATE_DOMAIN automatically, kill
off iommu_attach_group() by converting its couple of users,

This is what we did, iommu_attach_device() & _shared() are to be the
only interface for the drivers, and we killed off the
iommu_attach_group() couple of users except VFIO (the miss of
drivers/staging excepted)

and not only have we solved the VFIO problem but we've also finally
updated all the legacy code for free! Of course you can have a
separate version for VFIO to attach with
DMA_OWNER_PRIVATE_DOMAIN_USER if you like, although I still fail to
understand the necessity of the distinction.

And the seperate version for VFIO is called 'iommu_attach_group()'.

Lu, it is probably a good idea to add an assertion here that the group
is in DMA_OWNER_PRIVATE_DOMAIN_USER to make it clear that
iommu_attach_group() is only for VFIO.

VFIO has a special requirement that it be able to do:

+ ret = iommu_group_set_dma_owner(group->iommu_group,
+ DMA_OWNER_PRIVATE_DOMAIN_USER, f.file);

Without having a iommu_domain to attach.

This is because of the giant special case that PPC made of VFIO's
IOMMU code. PPC (aka vfio_iommu_spapr_tce.c) requires the group
isolation that iommu_group_set_dma_owner() provides, but does not
actually have an iommu_domain and can not/does not call
iommu_attach_group().

Fixing this is a whole other giant adventure I'm hoping David will
help me unwind next year..

This series solves this problem by using the two step sequence of
iommu_group_set_dma_owner()/iommu_attach_group() and conceptually
redefining how iommu_attach_group() works to require the external
caller to have done the iommu_group_set_dma_owner() for it. This is
why the series has three APIs, because the VFIO special one assumes
external iommu_group_set_dma_owner(). It just happens that is exactly
the same code as iommu_attach_group() today.

As for why does DMA_OWNER_PRIVATE_DOMAIN_USER exist? VFIO doesn't have
an iommu_domain at this point but it still needs the iommu core to
detatch the default domain. This is what the _USER does.

There is also a contract that after the USER ownership is claimed the
device could be accessed by userspace through the MMIO registers. So,
a device could be accessible by userspace before a user-space I/O
address is attached.


Soo..

There is another way to organize this and perhaps it does make more
sense. I will try to sketch briefly in email, try to imagine the
gaps..

API family (== compares to this series):

iommu_device_use_dma_api(dev);
== iommu_device_set_dma_owner(dev, DMA_OWNER_DMA_API, NULL);

iommu_group_set_dma_owner(group, file);
== iommu_device_set_dma_owner(dev, DMA_OWNER_PRIVATE_DOMAIN_USER,
file);
Always detaches all domains from the group

I hope we can drop all group variant APIs as we already have the per-
device interfaces, just iterate all device in the group and call the
device API.


iommu_attach_device(domain, dev)
== as is in this patch
dev and domain are 1:1

iommu_attach_device_shared(domain, dev)
== as is in this patch
dev and domain are N:1
* could just be the same as iommu_attach_device

iommu_replace_group_domain(group, old_domain, new_domain)
Makes group point at new_domain. new_domain can be NULL.

iommu_device_unuse_dma_api(dev)
== iommu_device_release_dma_owner() in this patch

iommu_group_release_dma_owner(group)
== iommu_detatch_group() && iommu_group_release_dma_owner()

VFIO would use the sequence:

iommu_group_set_dma_owner(group, file);
iommu_replace_group_domain(group, NULL, domain_1);
iommu_replace_group_domain(group, domain_1, domain_2);
iommu_group_release_dma_owner(group);

Simple devices would use

iommu_attach_device(domain, dev);
iommu_detatch_device(domain, dev);

Tegra would use:

iommu_attach_device_shared(domain, dev);
iommu_detatch_device_shared(domain, dev);
// Or not, if people agree we should not mark this

DMA API would have the driver core dma_configure do:
iommu_device_use_dma_api(dev);
dev->driver->probe()
iommu_device_unuse_dma_api(dev);

It is more APIs overall, but perhaps they have a much clearer
purpose.

I think it would be clear why iommu_group_set_dma_owner(), which
actually does detatch, is not the same thing as iommu_attach_device().

iommu_device_set_dma_owner() will eventually call
iommu_group_set_dma_owner(). I didn't get why
iommu_group_set_dma_owner() is special and need to keep.


I'm not sure if this entirely eliminates
DMA_OWNER_PRIVATE_DOMAIN_USER, or not, but at least it isn't in the
API.

Is it better?

Perhaps I missed anything. I have a simpler idea. We only need to have
below interfaces:

iommu_device_set_dma_owner(dev, owner);
iommu_device_release_dma_owner(dev, owner);
iommu_attach_device(domain, dev, owner);
iommu_detach_device(domain, dev);

All existing drivers calling iommu_attach_device() remain unchanged
since we already have singleton group enforcement. We only need to add
a default owner type.

For multiple-device group, like drm/tegra, the drivers should claim the
PRIVATE_DOMAIN ownership and call iommu_attach_device(domain, dev,
PRIVATE_DOMAIN) explicitly.

The new iommu_attach_device(domain, dev, owner) is a mix of the existing
iommu_attach_device() and the new iommu_attach_device_shared(). That
means,
if (group_is_singleton(group))
__iommu_atttach_device(domain, dev)
else
__iommu_attach_device_shared(domain, dev, owner)

The group variant interfaces will be deprecated and replace with the
device ones.

Sorry if I missed anything.


What VFIO wants is (conceptually[1]) "attach this device to my domain,
provided it and any other devices in its group are managed by a driver I
approve of."

Yes, sure, "conceptually". But, there are troublesome details.

VFIO will also need a struct device anyway, because once I get back from my
holiday in the new year I need to start working with Simon on evolving the
rest of the API away from bus->iommu_ops to dev->iommu so we can finally
support IOMMU drivers coexisting[2].

For VFIO it would be much easier to get the ops from the struct
iommu_group (eg via iommu_group->default_domain->ops, or whatever).

Indeed I agree with that second point, I'm just increasingly baffled how
it's not clear to you that there is only one fundamental use-case here.
Perhaps I'm too familiar with the history to objectively see how unclear the
current state of things might be :/

I think it is because you are just not familiar with the dark corners
of VFIO.

VFIO has a special case, I outlined above.

This is taking 426a to it's logical conclusion and *removing* the
group API from the drivers entirely. This is desirable because drivers
cannot do anything sane with the group.

I am in complete agreement with that (to the point of also not liking patch
#6).

Unfortunately patch #6 is only because of VFIO needing to use the
group as a handle.

Jason


Best regards,
baolu