Re: [PATCH v4 1/2] iommu/s390: Fix race with release_device ops

From: Jason Gunthorpe
Date: Tue Sep 06 2022 - 09:45:00 EST


On Mon, Sep 05, 2022 at 10:46:44AM +0100, Robin Murphy wrote:

> > I've been trying to understand Robin's latest remarks because maybe I
> > don't really understand your situation right.
>
> That was really just me thinking out loud to guess at how it must be
> happening - I wasn't sure whether VFIO is actually intended to allow that or
> not, so if not then by all means let's look at fixing that, but as I say I
> think we're only seeing it provoke a problem at the driver level because of
> 9ac8545199a1, and fixing VFIO doesn't fix that in general. And conversely if
> we *can* fix that properly at the IOMMU API level then the current VFIO
> behaviour should become benign again anyway.

Okay, so there are probably other problems here that highlighted
this..

> > IMHO this is definately a VFIO bug, because in a single-device group
> > we must not allow the domain to remain attached past remove(). Or more
> > broadly we shouldn't be holding ownership of a group without also
> > having a driver attached.
>
> FWIW I was assuming it might be fine for a VFIO user to hold the group open
> if they expect the device to come back again and re-bind (for example,
> perhaps over some reconfiguration that requires turning SR-IOV off and on
> again?)

Once all the devices in the group are removed then something like
pci_device_group() will have no way to discover the group again. eg
in the SRIOV case it will just fall right down to iommu_group_alloc(),
and that gives a new struct iommu_group and new IDR allocation.

So in the general case this doesn't happen, I don't think any VFIO
userspace should attempt to rely on it.

>From an API perspective is a much saner API toward iommu using drivers
like VFIO if those drivers only use the iommu api while they have a
device driver attached.

Regards,
Jason