Re: [PATCH v4 4/7] iommu: Switch __iommu_domain_alloc() to device ops

From: Jason Gunthorpe
Date: Wed Oct 04 2023 - 17:35:10 EST


On Wed, Oct 04, 2023 at 06:23:05PM +0100, Robin Murphy wrote:

> Sure, that would then leave two callsites for __iommu_group_domain_alloc(),
> which might in turn justify leaving it factored out, but the endgame is to
> get a device to resolve dev_iommu_ops(), so if we're starting with a
> suitable device in hand then using its group to immediately re-derive a
> device again when we *could* trivially pass the device straight through is
> silly and overcomplicated.

Well, it is the same sort of interface as attaching a domain to a
group. The group stuff is like that.

> However it turns out that we can't easily touch the group here at all
> without then needing dependent changes in VFIO,

Why? VFIO only puts its weird groups on the mdev bus and so they will
never hit this.

> more scope creep at this point I'm going to respin this as a purely sideways
> step that sticks to resolving ops from a bus, and save any further
> device/instance niceties for step 2 when I have to touch all the external
> callers anyway.

One thing that I've recently realized is that this patch will break
everything when combined with the patches I'm sending to start doing
finalize during domain_alloc_paging() ops. It will randomly select a
device on the bus which is proably the wrong smmu for the device we
intend to attach to later.

So, to keep everything together the special bus domain alloc path has
to continue to pass in a NULL dev to domain_alloc_paging.

Tthat is certainly a pretty good reason why the above can't use the
existing group call and then you can make the case it doesn't have
enough users to exist anymore.

> But that's the argument that makes no sense - it happens to be the case in
> the current code that all default domain allocation sites are already buried
> in several layers worth of group-based machinery, but that in itself holds
> no significance to the allocation process.

Wel yes, but also it is what it is. The group stuff gets everywhere in
undesired ways. The original version of this did write it they way you
suggest and Kevin came with the idea to have a group alloc domain. I
tried it and it was indeed cleaner. One place to do the Quite Strange
thing of randomly choosing a device in a group, and combined with the
get group ops function it allowed all the group centric code stay group
centric until the very fringe.

Keeping it clear that the dev was infact just made up in alot of call
paths is a nice bonus.

> only existed in Jason's head". If you don't want an IOMMU driver to be able
> to think the particular device is important, don't pass a bloody device! If
> the only intended purpose is for the driver to resolve dev->iommu instance
> data, then have the core do that and just pass the dev_iommu or iommu_dev
> directly; ambiguity solved.

Oh, I wanted to do that! I looked at doing that even. It doesn't work
for amd and smmuv3 :( They needs to know if the group has any PASID
capable devices or not to choose a PASID compatible RID domain for the
DMA API use. Can't get that from the iommu_dev alone

> If one is expected to look at subtleties 2 or 3 levels down the callchain of
> an API in order to understand how to implement that API correctly, that is
> *a badly designed API*, and piling on more "correctness theatre" code to
> attempt to highlight the bad design does not address the real problem there.

Oh, I agree, this group stuff is troubled like that. It confuses all
kinds of unrelated things into one bundle.

The group stuff exists at the core layer where everything wants to
operate on groups. We attach devices to groups, we allocate domains
for groups. Then the device level works on devices, not groups, and we
have this confusion.

> If you've noticed, between here and patch #3 I already end up effectively
> enforcing that devices in the same group must have identical device ops -

Doesn't it just fall out of the probe rework? All devices are now
attached to a domain during probe. If the single domain in the group
fails attach because of your patch 4 then the device will not probe.

Unraveling that would require multiple domains per groups, which
seems like a huge undertaking :)

> So AFAICS the problem you're desperate to address is a theoretical
> one of a driver author going out of their way to implement a single
> .domain_alloc_paging implementation badly, in a way which would only
> affect the behaviour of that driver, and should be easy to spot in
> patch review anyway. Any complexity in core code in an attempt to
> make any difference to that seems about as worthwhile as trying to
> hold back the tide.

Well note quite.. There are quite obvious things that can trip up a
driver. The requirement for a special RID IOPTE format to get PASID
support is tricky. The drivers are all functionally OK because we link
PASID support to singleton groups, but even that took a long time to
arrive at.

So, please try a v5, lets see what it looks like with the
domain_alloc_paging and semi-locking fix?

Jason