Re: [PATCH 1/3] PCI: Add helper to check if any of ancestor device support D3cold

From: Bjorn Helgaas
Date: Sat Aug 26 2023 - 09:11:42 EST


On Fri, Aug 25, 2023 at 09:39:48AM +0300, Mika Westerberg wrote:
> On Fri, Aug 25, 2023 at 01:43:08PM +0800, Kai-Heng Feng wrote:
> > On Fri, Aug 25, 2023 at 1:29 PM Mika Westerberg
> > <mika.westerberg@xxxxxxxxxxxxxxx> wrote:
> > > On Thu, Aug 24, 2023 at 09:46:00PM +0800, Kai-Heng Feng wrote:
> > > > On Thu, Aug 24, 2023 at 7:57 PM Mika Westerberg
> > > > <mika.westerberg@xxxxxxxxxxxxxxx> wrote:

> > I think what Bjorn suggested is to keep AER enabled for D3hot, and
> > only disable it for D3cold and S3.
> >
> > > > Unless there are cases when device firmware behave differently to
> > > > D3hot? Then maybe it's better to disable AER for both D3hot, D3cold
> > > > and system S3.
> > >
> > > Yes, this makes sense.
> >
> > I agree that differentiate between D3hot and D3cold unnecessarily make
> > things more complicated, but Bjorn suggested errors reported by AER
> > under D3hot should still be recorded.
> > Do you have more compelling data to persuade Bjorn that AER should be
> > disabled for both D3 states?
>
> Is there even an AER error that can happen when a device is in D3hot
> (link is in L1) or D3cold (link is in L2/3)? I'm not an expert in AER
> but AFAICT these errors are reported when the device is in active state
> not when it is in low power state.

I don't think a device in D3cold can signal its own errors. But the
link transition to L2/L3 as a device goes to D3cold may cause the
bridge above to log an error. And of course a config access to a
device in D3cold will probably result in an Unsupported Request being
logged by the bridge above it. I think these are the sorts of errors
we do need to avoid or ignore somehow.

But Configuration and Message requests definitely happen in D3hot, and
they can cause errors reported via AER. The spec (r6.0, sec 2.2.8)
recommends that Messages be handled the same in D0-D3hot.

PTM is an example of where we had errors being reported at suspend/
resume because we had it configured incorrectly. If we disabled AER
in D3hot we might not learn about that kind of configuration problem.
That's what makes me think there's some value in keeping AER enabled
in D3hot.

Bjorn