Re: [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode

From: luanshi
Date: Thu Jan 06 2022 - 03:55:12 EST


Hi Bjorn & Lukas & Kuppuswamy & Amey,

Gentle ping! Any comments on this patch,should be merged?


Thanks,

Liguang

在 2021/11/27 1:33, Lukas Wunner 写道:
On Thu, Nov 11, 2021 at 01:42:58PM +0800, Liguang Zhang wrote:
This patch fixes this problem that on driver probe from system startup,
pciehp checks the Presence Detect State bit in the Slot Status register
to bring up an occupied slot or bring down an unoccupied slot. If empty
slot's power status is on, turn power off. The Hot-Plug interrupt isn't
requested yet, so avoid triggering a notification by calling
pcie_disable_notification().

Both the CCIE and HPIE bits are masked in pcie_disable_notification(),
when we issue a hotplug command, pcie_wait_cmd() will polling the
Command Completed bit instead of waiting for an interrupt. But cmd_busy
bit was not cleared when Command Completed which results in timeouts
like this in pciehp_power_off_slot() and pcie_init_notification():

pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x01c0
(issued 2264 msec ago)
pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x05c0
(issued 2288 msec ago)

Signed-off-by: Liguang Zhang <zhangliguang@xxxxxxxxxxxxxxxxx>
Fixes: a5dd4b4b0570 ("PCI: pciehp: Wait for hotplug command completion where necessary")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215143
Reviewed-by: Lukas Wunner <lukas@xxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx # v4.19+

Thanks a lot, that's a really good catch.

It's a somewhat intricate bug, so I'll try to explain in my own words:

If notification is disabled (HPIE or CCIE not set in the Slot Status
register), we rely on pcie_poll_cmd() to poll for Command Completed.
But once it's signaled, we neglect to clear ctrl->cmd_busy.
(Normally it is cleared by the hardirq handler pciehp_isr() if
notification is enabled.)

The result is that starting with the second Slot Control write,
pciehp will gratuitously wait for a command to finish which has
already finished and it will incorrectly report a timeout.

The bug was originally introduced in 2015 by commit a5dd4b4b0570
("PCI: pciehp: Wait for hotplug command completion where necessary"),
but didn't manifest itself because the first Slot Control Write already
enabled notification and from that point on the hardirq handler would
clear ctrl->cmd_busy. However I think the bug may have manifested
itself with pciehp_poll_mode=1.

It wasn't until commit 4e6a13356f1c ("PCI: pciehp: Deduplicate presence
check on probe & resume") that multiple consecutive Slot Control writes
were performed on ->probe with notification disabled, so that's the
commit which first exposed the bug with pciehp_poll_mode=0.

Thanks,

Lukas

---
drivers/pci/hotplug/pciehp_hpc.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 83a0fa119cae..8698aefc6041 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -98,6 +98,8 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout)
if (slot_status & PCI_EXP_SLTSTA_CC) {
pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
PCI_EXP_SLTSTA_CC);
+ ctrl->cmd_busy = 0;
+ smp_mb();
return 1;
}
msleep(10);
--
2.19.1.6.gb485710b