Re: [PATCH v6 02/11] driver core: Add dma_cleanup callback in bus_type

From: Robin Murphy
Date: Wed Feb 23 2022 - 08:04:13 EST


On 2022-02-23 05:01, Lu Baolu wrote:
On 2/23/22 7:53 AM, Jason Gunthorpe wrote:
To spell it out, the scheme I'm proposing looks like this:
Well, I already got this, it is what is in driver_or_DMA_API_token()
that matters

I think you are suggesting to do something like:

    if (!READ_ONCE(dev->driver) ||  ???)
        return NULL;
    return group;  // A DMA_API 'token'

Which is locklessly reading dev->driver, and why you are talking about
races, I guess.


I am afraid that we are not able to implement a race-free
driver_or_DMA_API_token() helper. The lock problem between the IOMMU
core and driver core always exists.

It's not race-free. My point is that the races aren't harmful because what we might infer from the "wrong" information still leads to the right action. dev->driver is obviously always valid and constant for *claiming* ownership, since that either happens for the DMA API in the middle of really_probe() binding driver to dev, or while driver is actively using dev and calling iommu_group_claim_dma_owner(). The races exist during remove, but both probe and remove are serialised on the group mutex after respectively setting/clearing dev->driver, there are only 4 possibilities for the state of any other group sibling "tmp" during the time that dev holds that mutex in its remove path:

1 - tmp->driver is non-NULL because tmp is already bound.
1.a - If tmp->driver->driver_managed_dma == 0, the group must currently be DMA-API-owned as a whole. Regardless of what driver dev has unbound from, its removal does not release someone else's DMA API (co-)ownership.
1.b - If tmp->driver->driver_managed_dma == 1 and tmp->driver == group->owner, then dev must have unbound from the same driver, but either way that driver has not yet released ownership so dev's removal does not change anything.
1.c - If tmp->driver->driver_managed_dma == 1 and tmp->driver != group->owner, it doesn't matter. Even if tmp->driver is currently waiting to attempt to claim ownership it can't do so until we release the mutex.

2 - tmp->driver is non-NULL because tmp is in the process of binding.
2.a - If tmp->driver->driver_managed_dma == 0, tmp can be assumed to be waiting on the group mutex to claim DMA API ownership.
2.a.i - If the group is DMA API owned, this race is simply a short-cut to case 1.a - dev's ownership is effectively handed off directly to tmp, rather than potentially being released and immediately reclaimed. Once tmp gets its turn, it finds the group already DMA-API-owned as it wanted and all is well. This may be "unfair" if an explicit claim was also waiting, but not incorrect.
2.a.ii - If the group is driver-owned, it doesn't matter. Removing dev does not change the current ownership, and tmp's probe will eventually get its turn and find whatever it finds at that point in future.
2.b - If tmp->driver->driver_managed_dma == 1, it doesn't matter. Either that driver already owns the group, or it might try to claim it after we've resolved dev's removal and released the mutex, in which case it will find whatever it finds.

3 - tmp->driver is NULL because tmp is unbound. Obviously no impact.

4 - tmp->driver is NULL because tmp is in the process of unbinding.
4.a - If the group is DMA-API-owned, either way tmp has no further influence.
4.a.i - If tmp has unbound from a driver_managed_dma=0 driver, it must be waiting to release its DMA API ownership, thus if tmp would otherwise be the only remaining DMA API owner, the race is that dev's removal releases ownership on behalf of both devices. When tmp's own removal subsequently gets the mutex, it will either see that the group is already unowned, or maybe that someone else has re-claimed it in the interim, and either way do nothing, which is fine.
4.a.ii - If tmp has unbound from a driver_managed_dma=1 driver, it doesn't matter, as in case 1.c.
4.b - If the group is driver-owned, it doesn't matter. That ownership can only change if that driver releases it, which isn't happening while we hold the mutex.

As I said yesterday, I'm really just airing out an idea here; I might write up some proper patches as part of the bus ops work, and we can give it proper scrutiny then.

For example, when we implemented iommu_group_store_type() to change the
default domain type of a device through sysfs, we could only comprised
and limited this functionality to singleton groups to avoid the lock
issue.

Indeed, but once the probe and remove paths for grouped devices have to serialise on the group mutex, as we're introducing here, the story changes and we gain a lot more power. In fact that's a good point I hadn't considered yet - that sysfs constraint is functionally equivalent to the one in iommu_attach_device(), so once we land this ownership concept we should be free to relax it from "singleton" to "unowned" in much the same way as your other series is doing for attach.

Unfortunately, that compromise cannot simply applied to the problem to
be solved by this series, because the iommu core cannot abort the driver
binding when the conflict is detected in the bus notifier.

No, I've never proposed that probe-time DMA ownership can be claimed from a notifier, we all know why that doesn't work. It's only the dma_cleanup() step that *could* be punted back to iommu_bus_notifier vs. the driver core having to know about it. Either way we're still serialising remove/failure against probe/remove of other devices in a group, and that's the critical aspect.

Thanks,
Robin.