Re: [PATCH v3 1/1] vfio: remove VFIO_GROUP_NOTIFY_SET_KVM

From: Matthew Rosato
Date: Thu Jan 05 2023 - 19:16:59 EST


On 1/5/23 6:34 PM, Jason Gunthorpe wrote:
> On Thu, Jan 05, 2023 at 03:09:30PM -0700, Alex Williamson wrote:
>> On Thu, 19 May 2022 14:33:11 -0400
>> Matthew Rosato <mjrosato@xxxxxxxxxxxxx> wrote:
>>
>>> Rather than relying on a notifier for associating the KVM with
>>> the group, let's assume that the association has already been
>>> made prior to device_open. The first time a device is opened
>>> associate the group KVM with the device.
>>>
>>> This fixes a user-triggerable oops in GVT.
>>
>> It seems this has traded an oops for a deadlock, which still exists
>> today in both GVT-g and vfio-ap. These are the only vfio drivers that
>> care about kvm, so they make use of kvm_{get,put}_kvm(), where the

vfio-pci-zdev also

>> latter is called by their .close_device() callbacks.

Huh, I've never seen this deadlock with vfio-pci-zdev or vfio-ap, but I see what you're saying... I guess it's not seen under typical circumstances with QEMU because kvm_vfio_group_del would have already been triggered via KVM_DEV_VFIO_GROUP_DEL by the time we close the device, such that the group would not be found during the kvm_vfio_destroy call? (I'm not at all suggesting that we should rely on userspace behaving in this order, just wondering why I never saw it while testing)

>
> Bleck
>
> It is pretty common to run the final part of 'put' from a workqueue
> specifically to avoid stuff like this, eg fput does it
>
> Maybe that is the simplest?

Yeah, this is also what I was thinking, replace the direct kvm_put_kvm calls with, say, schedule_delayed_work in each driver, where the delayed task just does the kvm_put_kvm (along with a brief comment explaining why we handle the put asynchronously).

Other than that.. The goal of this patch originally was to get the kvm reference at first open_device and release it with the very last close_device, so the only other option I could think of would be to take the responsibility back from the vfio drivers and do the kvm_get_kvm and kvm_put_kvm directly in vfio_main after dropping the (but that would result in some ugly symbol linkage and would acquire kvm references that a driver maybe does not care about so I don't really like that idea)