Re: [Intel-wired-lan] [PATCH] igc: Ignore AER reset when device is suspended

From: Bjorn Helgaas
Date: Thu Jun 22 2023 - 09:11:32 EST


On Thu, Jun 22, 2023 at 08:09:34AM +0300, Neftin, Sasha wrote:
> On 6/21/2023 23:43, Bjorn Helgaas wrote:
> > On Tue, Jun 20, 2023 at 08:36:36PM +0800, Kai-Heng Feng wrote:
> > > When a system that connects to a Thunderbolt dock equipped with I225,
> > > I225 stops working after S3 resume:

> > > The issue is that the PTM requests are sending before driver resumes the
> > > device. Since the issue can also be observed on Windows, it's quite
> > > likely a firmware/hardwar limitation.
> >
> > I thought c01163dbd1b8 ("PCI/PM: Always disable PTM for all devices
> > during suspend") would turn off PTM. Is that not working for this
> > path, or are we re-enabling PTM incorrectly, or something else?
>
> I think we hit on the HW bug here. On some i225/6 parts, PTM requests are
> sent before SW takes ownership of the device. This patch could help.

Is there an erratum we can read? If this is needed to work around a
hardware defect, there should be a comment in the code to that effect,
and we should have a better understanding because there may be other
scenarios (suspend/resume, hotplug, etc) that need similar changes.

(I know this patch is to work around a suspend/resume issue, but the
change is in the AER error recovery path, so it doesn't quite fit
together for me yet.)

Are you saying the NIC sends PTM requests when it doesn't have PTM
Enable set?

What exactly does it mean for "SW to take ownership of the device"?
What PCIe transaction would tell the device the SW has taken
ownership?

So far this feels kind of hand-wavey.

> > Checking pci_is_enable() in the .error_detected() callback looks like
> > a pattern that may need to be replicated in many other drivers, which
> > makes me think it may not be the best approach.
> >
> > > So avoid resetting the device if it's not resumed. Once the device is
> > > fully resumed, the device can work normally.
> > >
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216850
> > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
> > > ---
> > > drivers/net/ethernet/intel/igc/igc_main.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> > > index fa764190f270..6a46f886ff43 100644
> > > --- a/drivers/net/ethernet/intel/igc/igc_main.c
> > > +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> > > @@ -6962,6 +6962,9 @@ static pci_ers_result_t igc_io_error_detected(struct pci_dev *pdev,
> > > struct net_device *netdev = pci_get_drvdata(pdev);
> > > struct igc_adapter *adapter = netdev_priv(netdev);
> > > + if (!pci_is_enabled(pdev))
> > > + return 0;
> > > +
> > > netif_device_detach(netdev);
> > > if (state == pci_channel_io_perm_failure)