Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups

From: Jason Gunthorpe
Date: Thu Dec 23 2021 - 21:50:58 EST


On Fri, Dec 24, 2021 at 09:30:17AM +0800, Lu Baolu wrote:
> Hi Jason,
>
> On 12/23/21 10:03 PM, Jason Gunthorpe wrote:
> > > > I think it would be clear why iommu_group_set_dma_owner(), which
> > > > actually does detatch, is not the same thing as iommu_attach_device().
> > > iommu_device_set_dma_owner() will eventually call
> > > iommu_group_set_dma_owner(). I didn't get why
> > > iommu_group_set_dma_owner() is special and need to keep.
> > Not quite, they would not call each other, they have different
> > implementations:
> >
> > int iommu_device_use_dma_api(struct device *device)
> > {
> > struct iommu_group *group = device->iommu_group;
> >
> > if (!group)
> > return 0;
> >
> > mutex_lock(&group->mutex);
> > if (group->owner_cnt != 0 ||
> > group->domain != group->default_domain) {
> > mutex_unlock(&group->mutex);
> > return -EBUSY;
> > }
> > group->owner_cnt = 1;
> > group->owner = NULL;
> > mutex_unlock(&group->mutex);
> > return 0;
> > }
>
> It seems that this function doesn't work for multi-device groups. When
> the user unbinds all native drivers from devices in the group and start
> to bind them with vfio-pci and assign them to user, how could iommu know
> whether the group is viable for user?

It is just a mistake, I made this very fast. It should work as your
patch had it with a ++. More like this:

int iommu_device_use_dma_api(struct device *device)
{
struct iommu_group *group = device->iommu_group;

if (!group)
return 0;

mutex_lock(&group->mutex);
if (group->owner_cnt != 0) {
if (group->domain != group->default_domain ||
group->owner != NULL) {
mutex_unlock(&group->mutex);
return -EBUSY;
}
}
group->owner_cnt++;
mutex_unlock(&group->mutex);
return 0;
}

> > See, we get rid of the enum as a multiplexor parameter, each API does
> > only wnat it needs, they don't call each other.
>
> I like the idea of removing enum parameter and make the API name
> specific. But I didn't get why they can't call each other even the
> data in group is the same.

Well, I think when you type them out you'll find they don't work the
same. Ie the iommu_group_set_dma_owner() does __iommu_detach_group()
which iommu_device_use_dma_api() definately doesn't want to
do. iommu_device_use_dma_api() checks the domain while
iommu_group_set_dma_owner() must not.

This is basically the issue, all the places touching ownercount are
superficially the same but each use different predicates. Given the
predicate is more than half the code I wouldn't try to share the rest
of it. But maybe when it is all typed in something will become
obvious?

> > We don't need _USER anymore because iommu_group_set_dma_owner() always
> > does detatch, and iommu_replace_group_domain() avoids ever reassigning
> > default_domain. The sepecial USER behavior falls out automatically.
>
> This means we will grow more group-centric interfaces. My understanding
> is the opposite that we should hide the concept of group in IOMMU
> subsystem, and the device drivers only faces device specific interfaces.

Ideally group interfaces would be reduced, but in this case VFIO needs
the group. It has sort of a fundamental problem with its uAPI that
expects the container is fully setup with a domain at the moment the
group is attached. So deferring domain setup to when the device is
available becomes a user visible artifact - and if this is important
or not is a whole research question that isn't really that important
for this series.

We also can't just pull a device out of thin air, a device that hasn't
been probed() hasn't even had dma_configure called! Let alone the
lifetime and locking problems with that kind of idea.

So.. leaving it as a group interface makes the most sense,
particularly for this series which is really about fixing the sharing
model in the iommu core and deleting the BUG_ONs.

Also, I'm sitting here looking at Robin's idea that
iommu_attach_device() and iommu_attach_device_shared() should be the
same - and that does seem conceptually appealing, but not so simple.

The difference is that iommu_attach_device_shared() requires the
device_driver to have set suppress_auto_claim_dma_owner while
iommu_attach_device() does not (Lu, please do add a kdoc comment
documenting this, and maybe a WARN_ON check to enforce it).

Changing all 11 drivers using iommu_attach_device() to also set
suppress_auto_claim_dma_owner is something to do in another series,
merged properly through the driver trees, if it is done at all. So
this series needs to keep both APIs.

However, what we should be doing is fixing iommu_attach_device() to
rely on the owner_cnt, and not iommu_group_device_count().

Basically it's logic should instead check for the owner_cnt == 1 and
then transform the group from a DMA_OWNER_DMA_API to a
DMA_OWNER_PRIVATE_DOMAIN. If we get rid of the enum then this happens
naturally by making group->domain != group->default_domain. All that
is missing is the owner_cnt == 1 check and some commentary. Again
also with a WARN_ON and documentation that
suppress_auto_claim_dma_owner is not set. (TBH, I thought this was
discussed already, I haven't yet carefully checked v4..)

Then, we rely on iommu_device_use_dma_api() to block further users of
the group and remove the iommu_group_device_count() hack.

Jason