Re: Race b/w pciehp and FirmwareFirst DPC->EDR flow - Reg

From: Lukas Wunner
Date: Mon Nov 27 2023 - 10:26:54 EST


Hi Vidya,

sorry for the delay, still catching up on e-mails after Plumbers...

On Fri, Nov 10, 2023 at 10:31:55PM +0530, Vidya Sagar wrote:
> > - System doesn't have support for in-band PD and supports only OOB PD
> > where writing to a private register would set the PD state

We already have an inband_presence_disabled flag in struct controller
which is set if the In-Band PD Disable Supported bit in the Slot
Capabilities 2 Register is set. The flag may also be set through the
inband_presence_disabled_dmi_table[].

Currently the only place where the flag makes a difference is on
slot bringup: pciehp_check_link_status() doesn't wait for the
Presence Detect Status bit to become set.

I'm wondering if we need to also disregard PDC events if In-Band PD
is disabled. Not sure if the behavior you're seeing is caused by a
quirk of the hardware or is expected if In-Band PD is disabled.
Probably the former. A code change would generally only be acceptable
in the latter case though I think.


> > 10. Since PDC (Presence Detect Change) bit is also set for the first
> > interrupt, IST attempts to remove the devices (as part of
> > pciehp_handle_presence_or_link_change())
> >
> > At this point, there is a race between the device driver that is
> > trying to work with the device (through pci_error_handlers callback)
> > and the IST that is trying to remove the device.
> > To be fair to pciehp_handle_presence_or_link_change(), after removing
> > the devices, it checks for the link-up/PD being '1' and scans the
> > devices again if the device is still available. But unfortunately,
> > IST is deadlocked (with the device driver) while removing the devices
> > itself and won't go to the next step.

Could you provide stacktraces of the two deadlocked tasks?
Right now I don't quite understand why they're deadlocked.

Are you getting hung task messages in dmesg?
They should include stacktraces.

Also, which kernel version are we talking about?

Thanks,

Lukas