Re: [PATCH v4 4/7] iommu: Switch __iommu_domain_alloc() to device ops

From: Robin Murphy
Date: Wed Oct 04 2023 - 13:23:15 EST


On 02/10/2023 8:36 pm, Jason Gunthorpe wrote:
On Mon, Oct 02, 2023 at 08:02:23PM +0100, Robin Murphy wrote:
On 02/10/2023 3:16 pm, Jason Gunthorpe wrote:
On Mon, Oct 02, 2023 at 02:49:12PM +0100, Robin Murphy wrote:
@@ -2120,20 +2120,30 @@ static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
return domain;
}
-static struct iommu_domain *
-__iommu_group_domain_alloc(struct iommu_group *group, unsigned int type)
+static int __iommu_domain_alloc_dev(struct device *dev, void *data)
{

Why?

Because out of 3 callsites, two were in a context which now needed to
make the iommu_group_first_dev() call itself anyway,

I don't see it. Why not just do this?

static int __iommu_domain_alloc_dev(struct device *dev, void *data)
{
/* FIXME: This is not correctly locked */
struct iommu_group *group = iommu_group_get(dev);
struct group **alloc_group = data;

if (!group)
return 0;

mutex_lock(&group->mutex);
/* Theoretically we could have raced against release */
if (list_empty(&group->devices)) {
mutex_unlock(&group->mutex);
iommu_group_put(group);
return 0;
}

*alloc_group = group;
return 1;
}

struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
{
struct iommu_group *alloc_group;
struct iommu_domain *dom;

/* Only one driver is supported with this interface */
if (WARN_ON(iommu_num_drivers > 1))
return NULL;

bus_for_each_dev(bus, NULL, &alloc_group, __iommu_domain_alloc_dev);
if (!alloc_group)
return NULL;
dom = __iommu_group_domain_alloc(alloc_group, IOMMU_DOMAIN_UNMANAGED);
mutex_unlock(&alloc_group->mutex);
iommu_group_put(alloc_group);
return dom;
}
EXPORT_SYMBOL_GPL(iommu_domain_alloc);

(and ++/-- iommu_num_drivers in iommu_device_register)

One patch, it's pretty easy???

Sure, that would then leave two callsites for __iommu_group_domain_alloc(), which might in turn justify leaving it factored out, but the endgame is to get a device to resolve dev_iommu_ops(), so if we're starting with a suitable device in hand then using its group to immediately re-derive a device again when we *could* trivially pass the device straight through is silly and overcomplicated.

However it turns out that we can't easily touch the group here at all without then needing dependent changes in VFIO, so rather than invite any more scope creep at this point I'm going to respin this as a purely sideways step that sticks to resolving ops from a bus, and save any further device/instance niceties for step 2 when I have to touch all the external callers anyway.

Um... Good? I mean in 3/4 cases it's literally the exact same code just
factored out again, while the one I've added picks some arbitrary device
in a different way.

Sure, but the whole point was to make it obvious that there was no
direct linkage from the various dev parameters we have in places and
what dev will be passed by the driver.

But that's the argument that makes no sense - it happens to be the case in the current code that all default domain allocation sites are already buried in several layers worth of group-based machinery, but that in itself holds no significance to the allocation process. I maintain that it is simply misleading to pretend that (taking a little artistic license with types) a round trip through dev->iommu_group->devices->next->dev->iommu holds any significance over just dev->iommu. There's hardly anything "obvious" about it either - it's buried in core code that driver authors have little reason to even look at.

If I'm implementing a driver, I'm going to see the signature of the op I need to implement, which tells me to allocate a domain and gives me a device to allocate it for, and for reference I'm going to look at how other drivers implement that op. I would not start by picking through the core code for the callers of the caller of that op, to see some weird code that looks redundant when in practice I observe it resolving a device back to itself, and think "ah yes, now I see the undocumented significance of an idea that only existed in Jason's head". If you don't want an IOMMU driver to be able to think the particular device is important, don't pass a bloody device! If the only intended purpose is for the driver to resolve dev->iommu instance data, then have the core do that and just pass the dev_iommu or iommu_dev directly; ambiguity solved.

If one is expected to look at subtleties 2 or 3 levels down the callchain of an API in order to understand how to implement that API correctly, that is *a badly designed API*, and piling on more "correctness theatre" code to attempt to highlight the bad design does not address the real problem there.

Everything passes through
__iommu_group_domain_alloc() and at the end of the day that should be
the only allocation API.

But *why* should it? What's inherently more special about a group vs. a device or an IOMMU instance (especially with those being the things drivers actually deal with in relation to domains)?

If you've noticed, between here and patch #3 I already end up effectively enforcing that devices in the same group must have identical device ops - that's arguably an even stronger requirement than we need, but it fits the current drivers so we may as well not relax it until proven necessary. So AFAICS the problem you're desperate to address is a theoretical one of a driver author going out of their way to implement a single .domain_alloc_paging implementation badly, in a way which would only affect the behaviour of that driver, and should be easy to spot in patch review anyway. Any complexity in core code in an attempt to make any difference to that seems about as worthwhile as trying to hold back the tide.

Thanks,
Robin.