Re: [PATCH 0/2] pci/iov: avoid device_lock() when reading sriov_numvfs

From: Jim Harris
Date: Fri Feb 09 2024 - 18:15:40 EST


On Thu, Feb 08, 2024 at 06:30:02PM -0600, Bjorn Helgaas wrote:
> [+cc Pierre, author of 35ff867b7657 ("PCI/IOV: Serialize sysfs
> sriov_numvfs reads vs writes")]
>
> On Wed, Dec 20, 2023 at 10:58:12PM +0000, Jim Harris wrote:
> > If SR-IOV enabled device is held by vfio, and device is removed,
> > vfio will hold device lock and notify userspace of the removal. If
> > userspace reads sriov_numvfs sysfs entry, that thread will be
> > blocked since sriov_numvfs_show() also tries to acquire the device
> > lock. If that same thread is responsible for releasing the device to
> > vfio, it results in a deadlock.
> >
> > One patch was proposed to add a separate mutex, specifically for
> > struct pci_sriov, to synchronize access to sriov_numvfs in the sysfs
> > paths (replacing use of the device_lock()). Leon instead suggested
> > just reverting the commit 35ff867b765 which introduced device_lock()
> > in the store path. This also led to a small fix around ordering on
> > the kobject_uevent() when sriov_numvfs is updated.
> >
> > Ref: https://lore.kernel.org/linux-pci/ZXJI5+f8bUelVXqu@ubuntu/
>
> 1) Cc author of the commit being reverted (Pierre) so he has a chance
> to chime in and make sure the proposed fix works for him as well.

Ack. I'll also Cc Pierre on the v2.

> 2) The revert commit log needs to justify the revert, not merely say
> what the proper way is. The Ref: above suggests that the current code
> (pre-revert) leads to a deadlock in some cases, so the revert commit
> log should detail that.
>
> It's ideal if we never regress, not even between the revert and the
> second patch, so it's possible that they should be squashed into a
> single patch. But if you keep it as two patches, it's trivial for me
> to squash them if we decide that's best.

The deadlock I hit is fixed by patch 1 alone. Patch 2 is a separate
bug - it's better to update the num_VFs value before sending the notification
that the num_VFs value changed.

I'll add some more color to that commit message too, to differentiate it
from the revert. I have no issues if you eventually decide to squash them.
>
> 3) Follow subject line convention for drivers/pci (use "git log
> --oneline drivers/pci" to learn it).

Will fix in v2.

Thanks,

Jim