Re: [PATCH v6 02/11] driver core: Add dma_cleanup callback in bus_type

From: Robin Murphy
Date: Tue Feb 22 2022 - 05:59:05 EST


On 2022-02-21 23:48, Jason Gunthorpe wrote:
On Mon, Feb 21, 2022 at 08:43:33PM +0000, Robin Murphy wrote:
On 2022-02-19 07:32, Christoph Hellwig wrote:
So we are back to the callback madness instead of the nice and simple
flag? Sigh.

TBH, I *think* this part could be a fair bit simpler. It looks like this
whole callback mess is effectively just to decrement
group->owner_cnt, but

Right, the new callback is because of Greg's push to put all the work
into the existing bus callback. Having symetrical callbacks is
cleaner.

I'll continue to disagree that having tons more code purely for the sake of it is cleaner. The high-level requirements are fundamentally asymmetrical - ownership has to be actively claimed by the bus code at a point during probe where it can block probing if necessary, but it can be released anywhere at all during remove since that cannot fail. I don't personally see the value in a bunch of code bloat for no reason other than trying to pretend that an asymmetrical thing isn't.

We already have other concepts in the IOMMU API, like the domain ops lifecycle, which are almost self-contained but for needing an external prod to get started, so I'm naturally viewing this one the same way.

since we should only care about ownership at probe, hotplug, and other
places well outside critical fast-paths, I'm not sure we really need to keep
track of that anyway - it can always be recalculated by walking the
group->devices list,

It has to be locked against concurrent probe, and there isn't
currently any locking scheme that can support this. The owner_cnt is
effectively a new lock for this purpose. It is the same issue we
talked about with that VFIO patch you showed me.

Huh? How hard is it to hold group->mutex when reading or writing group->owner? Walking the list would only have to be done for *releasing* ownership and I'm pretty sure all the races there are benign - only probe/remove of the driver (or DMA API token) matching a current non-NULL owner matter; if two removes race, the first might end up releasing ownership "early", but the second is waiting to do that anyway so it's OK; if a remove races with a probe, the remove may end up leaving the owner set, but the probe is waiting to do that anyway so it's OK.

So, using the group->device_list would require adding something else
somewhere - which I think should happen when someone has
justification for another use of whatever that something else is.

Also, Greg's did have an objection to the the first version, with code
living in dd.c, that was basically probe time performance. I'm not
sure making this slower would really be welcomed..

Again, this does not affect probe at all, only remove, and TBH I'd expect the performance impact to be negligible. On any sensible system, IOMMU groups are not large. Heck, in the typical case I'd guess it's no worse than the time we currently spend on group notifiers. I was just making the point that there should not be a significant performance argument for needing to cache a count value.

and some of the relevant places have to do that anyway.

???

I was looking at iommu_group_remove_device() at the time, but of course we should always have seen an unbind before we get there - that one's on me, sorry for the confusion.

It has to be s It should be pretty straightforward for
iommu_bus_notifier to clear group->owner automatically upon an
unbind of the matching driver when it's no longer bound to any other
devices in the group either.

That not_bound/unbind notifier isn't currently triggred during
necessary failure paths of really_probe().

Eh? Just look at the context of patch #2, let alone the rest of the function, and tell me how, if we can't rely on BUS_NOTIFY_DRIVER_NOT_BOUND, calling .dma_cleanup *from the exact same place* is somehow more reliable?

AFAICS, a notifier handling both BUS_NOTIFY_UNBOUND_DRIVER and BUS_NOTIFY_DRIVER_NOT_BOUND would be directly equivalent to the callers of .dma_cleanup here.

Even if this was patched up, it looks like spaghetti to me..

use-case) then it should be up to VFIO to decide when it's finally
finished with the whole group, rather than pretending we can keep
track of nested ownership claims from inside the API.

What nesting?

The current implementation of iommu_group_claim_dma_owner() allows owner_cnt to increase beyond 1, and correspondingly requires iommu_group_release_dma_owner() to be called the same number of times. It doesn't appear that VFIO needs that, and I'm not sure I'd trust any other potential users to get it right either.

Furthermore, If Greg was willing to compromise just far enough to let us put
driver_managed_dma in the 3-byte hole in the generic struct
device_driver,

Space was not an issue, the earlier version of this switched an
existing bool to a bitfield.

we wouldn't have to have quite so much boilerplate repeated across the
various bus implementations (I'm not suggesting to move any actual calls
back into the driver core, just the storage of flag itself).

Not sure that makes sense.. But I don't understand why we need to copy
and paste this code into every bus's dma_configure *shrug*

That's what I'm saying - right now every bus *has* to have a specific .dma_configure implementation if only to retrieve a semantically-identical flag from each bus-specific structure; there is zero possible code-sharing. With a generically-defined flag, there is some possibility for code-sharing now (e.g. patch #3 wouldn't be needed), and the potential for more in future.

FWIW I have some ideas for re-converging .dma_configure in future
which I think should probably be able to subsume this into a
completely generic common path, given a common flag.

This would be great!

Indeed, so if we're enthusiastic about future cleanup that necessitates a generic flag, why not make the flag generic to start with?

Thanks,
Robin.