Re: [PATCH RFC v4 02/21] PCI: Fix race condition in pci_enable/disable_device()

From: Sergey Miroshnichenko
Date: Wed Mar 27 2019 - 13:21:45 EST


On 3/26/19 10:00 PM, Bjorn Helgaas wrote:
> [+cc Srinath, Marta, LKML]
>
> On Mon, Mar 11, 2019 at 04:31:03PM +0300, Sergey Miroshnichenko wrote:
>> CPU0 CPU1
>>
>> pci_enable_device_mem() pci_enable_device_mem()
>> pci_enable_bridge() pci_enable_bridge()
>> pci_is_enabled()
>> return false;
>> atomic_inc_return(enable_cnt)
>> Start actual enabling the bridge
>> ... pci_is_enabled()
>> ... return true;
>> ... Start memory requests <-- FAIL
>> ...
>> Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
>>
>> This patch protects the pci_enable/disable_device() and pci_enable_bridge()
>> with mutexes.
>
> This is a subtle issue that we've tried to fix before, but we've never
> had a satisfactory solution, so I hope you've figured out the right
> fix.
>
> I'll include some links to previous discussion. This patch is very
> similar to [2], which we didn't actually apply. We did apply the
> patch from [3] as 40f11adc7cd9 ("PCI: Avoid race while enabling
> upstream bridges"), but it caused the regressions reported in [4,5],
> so we reverted it with 0f50a49e3008 ("Revert "PCI: Avoid race while
> enabling upstream bridges"").
>

Thanks for the links, I wasn't aware of these discussions and patches!

On PowerNV this issue is partially hidden by db2173198b95 ("powerpc/powernv/pci: Work
around races in PCI bridge enabling"), and on x86 BIOS pre-initializes all the bridges, so
it doesn't reproduce until hotplugging in a hotplugged bridge.

This patch is indeed similar to 40f11adc7cd9 ("PCI: Avoid race while enabling upstream
bridges"), but instead of a single static mutex it adds per-device mutexes and prevents
the dev->enable_cnt from incrementing too early. So it's not needed anymore to carefully
select a moment safe enough to enable the device.

Serge

> I think the underlying design problem is that we have a driver for
> device B calling pci_enable_device(), and it is changing the state of
> device A (an upstream bridge). The model generally is that a driver
> should only touch the device it is bound to.
>
> It's tricky to get the locking right when several children of device A
> all need to operate on A.
>
> That's all to say I'll have to think carefully about this particular
> patch, so I'll go on to the others and come back to this one.
>
> Bjorn
>
> [1] https://lore.kernel.org/linux-pci/1494256190-28993-1-git-send-email-srinath.mannam@xxxxxxxxxxxx/T/#u
> [RFC PATCH] pci: Concurrency issue in NVMe Init through PCIe switch
>
> [2] https://lore.kernel.org/linux-pci/1496135297-19680-1-git-send-email-srinath.mannam@xxxxxxxxxxxx/T/#u
> [RFC PATCH v2] pci: Concurrency issue in NVMe Init through PCIe switch
>
> [3] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@xxxxxxxxxxxx/T/#u
> [RFC PATCH v3] pci: Concurrency issue during pci enable bridge
>
> [4] https://lore.kernel.org/linux-pci/150547971091.977464.16294045866179907260.stgit@buzz/T/#u
> [PATCH bisected regression in 4.14] PCI: fix race while enabling upstream bridges concurrently
>
> [5] https://lore.kernel.org/linux-wireless/04c9b578-693c-1dc6-9f0f-904580231b21@xxxxxxxxx/T/#u
> iwlwifi firmware load broken in current -git
>
> [6] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@xxxxxxxxx/T/#u
> [RFC PATCH] nvme: avoid race-conditions when enabling devices
>
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@xxxxxxxxx>
>> ---
>> drivers/pci/pci.c | 26 ++++++++++++++++++++++----
>> drivers/pci/probe.c | 1 +
>> include/linux/pci.h | 1 +
>> 3 files changed, 24 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index f006068be209..895201d4c9e6 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
>> struct pci_dev *bridge;
>> int retval;
>>
>> + mutex_lock(&dev->enable_mutex);
>> +
>> bridge = pci_upstream_bridge(dev);
>> if (bridge)
>> pci_enable_bridge(bridge);
>> @@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
>> if (pci_is_enabled(dev)) {
>> if (!dev->is_busmaster)
>> pci_set_master(dev);
>> + mutex_unlock(&dev->enable_mutex);
>> return;
>> }
>>
>> @@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>> pci_err(dev, "Error enabling bridge (%d), continuing\n",
>> retval);
>> pci_set_master(dev);
>> + mutex_unlock(&dev->enable_mutex);
>> }
>>
>> static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>> {
>> struct pci_dev *bridge;
>> + /* Enable-locking of bridges is performed within the pci_enable_bridge() */
>> + bool need_lock = !dev->subordinate;
>> int err;
>> int i, bars = 0;
>>
>> @@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>> dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
>> }
>>
>> - if (atomic_inc_return(&dev->enable_cnt) > 1)
>> + if (need_lock)
>> + mutex_lock(&dev->enable_mutex);
>> + if (pci_is_enabled(dev)) {
>> + if (need_lock)
>> + mutex_unlock(&dev->enable_mutex);
>> return 0; /* already enabled */
>> + }
>>
>> bridge = pci_upstream_bridge(dev);
>> if (bridge)
>> @@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>> bars |= (1 << i);
>>
>> err = do_pci_enable_device(dev, bars);
>> - if (err < 0)
>> - atomic_dec(&dev->enable_cnt);
>> + if (err >= 0)
>> + atomic_inc(&dev->enable_cnt);
>> + if (need_lock)
>> + mutex_unlock(&dev->enable_mutex);
>> return err;
>> }
>>
>> @@ -1910,15 +1923,20 @@ void pci_disable_device(struct pci_dev *dev)
>> if (dr)
>> dr->enabled = 0;
>>
>> + mutex_lock(&dev->enable_mutex);
>> dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
>> "disabling already-disabled device");
>>
>> - if (atomic_dec_return(&dev->enable_cnt) != 0)
>> + if (atomic_dec_return(&dev->enable_cnt) != 0) {
>> + mutex_unlock(&dev->enable_mutex);
>> return;
>> + }
>>
>> do_pci_disable_device(dev);
>>
>> dev->is_busmaster = 0;
>> +
>> + mutex_unlock(&dev->enable_mutex);
>> }
>> EXPORT_SYMBOL(pci_disable_device);
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 2ec0df04e0dc..977a127ce791 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2267,6 +2267,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
>> INIT_LIST_HEAD(&dev->bus_list);
>> dev->dev.type = &pci_dev_type;
>> dev->bus = pci_bus_get(bus);
>> + mutex_init(&dev->enable_mutex);
>>
>> return dev;
>> }
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 77448215ef5b..cb2760a31fe2 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -419,6 +419,7 @@ struct pci_dev {
>> unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */
>> pci_dev_flags_t dev_flags;
>> atomic_t enable_cnt; /* pci_enable_device has been called */
>> + struct mutex enable_mutex;
>>
>> u32 saved_config_space[16]; /* Config space saved at suspend time */
>> struct hlist_head saved_cap_space;
>> --
>> 2.20.1
>>

Attachment: signature.asc
Description: OpenPGP digital signature