Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"

From: Alex G.
Date: Tue Feb 02 2021 - 15:29:26 EST


On 2/2/21 2:16 PM, Bjorn Helgaas wrote:
On Tue, Feb 02, 2021 at 01:50:20PM -0600, Alex G. wrote:
On 1/29/21 3:56 PM, Bjorn Helgaas wrote:
On Thu, Jan 28, 2021 at 06:07:36PM -0600, Alex G. wrote:
On 1/28/21 5:51 PM, Sinan Kaya wrote:
On 1/28/2021 6:39 PM, Bjorn Helgaas wrote:
AFAICT, this thread petered out with no resolution.

If the bandwidth change notifications are important to somebody,
please speak up, preferably with a patch that makes the notifications
disabled by default and adds a parameter to enable them (or some other
strategy that makes sense).

I think these are potentially useful, so I don't really want to just
revert them, but if nobody thinks these are important enough to fix,
that's a possibility.

Hide behind debug or expert option by default? or even mark it as BROKEN
until someone fixes it?

Instead of making it a config option, wouldn't it be better as a kernel
parameter? People encountering this seem quite competent in passing kernel
arguments, so having a "pcie_bw_notification=off" would solve their
problems.

I don't want people to have to discover a parameter to solve issues.
If there's a parameter, notification should default to off, and people
who want notification should supply a parameter to enable it. Same
thing for the sysfs idea.

I can imagine cases where a per-port flag would be useful. For example, a
machine with a NIC and a couple of PCIe storage drives. In this example, the
PCIe drives downtrain willie-nillie, so it's useful to turn off their
notifications, but the NIC absolutely must not downtrain. It's debatable
whether it should be default on or default off.

I think we really just need to figure out what's going on. Then it
should be clearer how to handle it. I'm not really in a position to
debug the root cause since I don't have the hardware or the time.

I wonder
(a) if some PCIe devices are downtraining willie-nillie to save power
(b) if this willie-nillie downtraining somehow violates PCIe spec
(c) what is the official behavior when downtraining is intentional

My theory is: YES, YES, ASPM. But I don't know how to figure this out
without having the problem hardware in hand.

If nobody can figure out what's going on, I think we'll have to make it
disabled by default.

I think most distros do "CONFIG_PCIE_BW is not set". Is that not true?

I think it *is* true that distros do not enable CONFIG_PCIE_BW.

But it's perfectly reasonable for people building their own kernels to
enable it. It should be safe to enable all config options. If they
do enable CONFIG_PCIE_BW, I don't want them to waste time debugging
messages they don't expect.

If we understood why these happen and could filter out the expected
ones, that would be great. But we don't. We've already wasted quite
a bit of Jan's and Atanas' time, and no doubt others who haven't
bothered to file bug reports.

So I think I'll queue up a patch to remove the functionality for now.
It's easily restored if somebody debugs the problem or adds a
command-line switch or something.

I think it's best we make it a module (or kernel) parameter, default=off for the time being.

Alex