Re: [PATCH v2] igc: Ignore AER reset when device is suspended

From: Kai-Heng Feng
Date: Mon Jul 17 2023 - 03:38:32 EST


[+Cc Aaron]

On Fri, Jul 14, 2023 at 10:54 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Fri, Jul 14, 2023 at 01:05:41PM +0800, Kai-Heng Feng wrote:
> > When a system that connects to a Thunderbolt dock equipped with I225,
> > like HP Thunderbolt Dock G4, I225 stops working after S3 resume:
> > ...
>
> > The issue is that the PTM requests are sending before driver resumes the
> > device. Since the issue can also be observed on Windows, it's quite
> > likely a firmware/hardware limitation.
>
> Does this mean we didn't disable PTM correctly on suspend? Or is the

PTM gets disabled correctly during suspend, by commit c01163dbd1b8
("PCI/PM: Always disable PTM for all devices during suspend").
Before that commit the suspend will fail.

> device defective and sending PTM requests even though PTM is disabled?

Yes. When S3 resume, I guess the firmware resets the dock and/or I225
so PTM request starts even before the OS is resumed.
AFAIK the issue doesn't happen when s2Idle is used.

>
> If the latter, I vote for a quirk that just disables PTM completely
> for this device.

The S3 resume enables PTM regardless of OS involvement. So I don't
think this will work.

>
> This check in .error_detected() looks out of place to me because
> there's no connection between AER and PTM, there's no connection
> between PTM and the device being enabled, and the connection between
> the device being enabled and being fully resumed is a little tenuous.

True. This patch is just a workaround.

Have you considered my other proposed approach? Like disable AER
completely during suspend, or even defer the resuming of PCIe services
after the entire hierarchy is resumed?

>
> If we must do it this way, maybe add a comment about *why* we're
> checking pci_is_enabled(). Otherwise this will be copied to other
> drivers that don't need it.

Sure.

Kai-Heng

>
> > So avoid resetting the device if it's not resumed. Once the device is
> > fully resumed, the device can work normally.
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216850
> > Reviewed-by: Guilherme G. Piccoli <gpiccoli@xxxxxxxxxx>
> > Acked-by: Vinicius Costa Gomes <vinicius.gomes@xxxxxxxxx>
> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
> >
> > ---
> > v2:
> > - Fix typo.
> > - Mention the product name.
> >
> > drivers/net/ethernet/intel/igc/igc_main.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> > index 9f93f0f4f752..8c36bbe5e428 100644
> > --- a/drivers/net/ethernet/intel/igc/igc_main.c
> > +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> > @@ -7115,6 +7115,9 @@ static pci_ers_result_t igc_io_error_detected(struct pci_dev *pdev,
> > struct net_device *netdev = pci_get_drvdata(pdev);
> > struct igc_adapter *adapter = netdev_priv(netdev);
> >
> > + if (!pci_is_enabled(pdev))
> > + return 0;
> > +
> > netif_device_detach(netdev);
> >
> > if (state == pci_channel_io_perm_failure)
> > --
> > 2.34.1
> >