Re: [PATCH 2/2] PCI: Ignore PCIe ports used for tunneling in pcie_bandwidth_available()

From: Bjorn Helgaas
Date: Thu Nov 02 2023 - 11:47:59 EST


On Wed, Nov 01, 2023 at 08:14:31PM -0500, Mario Limonciello wrote:
> On 11/1/2023 17:52, Bjorn Helgaas wrote:
> > On Tue, Oct 31, 2023 at 08:34:38AM -0500, Mario Limonciello wrote:
> > > The USB4 spec specifies that PCIe ports that are used for tunneling
> > > PCIe traffic over USB4 fabric will be hardcoded to advertise 2.5GT/s.
> > >
> > > In reality these ports speed is controlled by the fabric implementation.
> >
> > So I guess you're saying the speed advertised by PCI_EXP_LNKSTA is not
> > the actual speed? And we don't have a generic way to find the actual
> > speed?
>
> Correct.
>
> > > Downstream drivers such as amdgpu which utilize pcie_bandwidth_available()
> > > to program the device will always find the PCIe ports used for
> > > tunneling as a limiting factor and may make incorrect decisions.
> > >
> > > To prevent problems in downstream drivers check explicitly for ports
> > > being used for PCIe tunneling and skip them when looking for bandwidth
> > > limitations.
> > >
> > > 2 types of devices are detected:
> > > 1) PCIe root port used for PCIe tunneling
> > > 2) Intel Thunderbolt 3 bridge
> > >
> > > Downstream drivers could make this change on their own but then they
> > > wouldn't be able to detect other potential speed bottlenecks.
> >
> > Is the implication that a tunneling port can *never* be a speed
> > bottleneck? That seems to be how this patch would work in practice.
>
> I think that's a stretch we should avoid concluding.

I'm just reading the description and the patch, which seem to say that
pcie_bandwidth_available() will never report a tunneling port as the
limiting port.

Maybe this can be rectified with a comment about how we can't tell the
actual bandwidth of a tunneled port, and it may be a hidden unreported
bottleneck, so pcie_bandwidth_available() can't actually return a
reliable result. Seems sort of unsatisfactory, but ... I dunno, maybe
it's the best we can do.

> IIUC the fabric can be hosting other traffic and it's entirely possible the
> traffic over the tunneling port runs more slowly at times.
>
> Perhaps that's why the the USB4 spec decided to advertise it this way? I
> don't know.

Maybe, although the same happens on shared PCIe links above switches.

> > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925
> >
> > This issue says the external GPU doesn't work at all. Does this patch
> > fix that? This patch looks like it might improve GPU performance, but
> > wouldn't fix something that didn't work at all.
>
> The issue actually identified 4 distinct different problems. The 3 problems
> will be fixed in amdgpu which are functional.
>
> This performance one was from later in the ticket after some back and forth
> identifying proper solutions for the first 3.

There's a lot of material in that report. Is there a way to link to
the specific part related to performance?

> > > + * This function excludes root ports and bridges used for USB4 and TBT3 tunneling.

Wrap to fit in 80 columns like the rest of the file.

> > > */
> > > u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
> > > enum pci_bus_speed *speed,
> > > @@ -6254,6 +6290,10 @@ u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
> > > bw = 0;
> > > while (dev) {
> > > + /* skip root ports and bridges used for tunneling */
> > > + if (pcie_is_tunneling_port(dev))
> > > + goto skip;
> > > +
> > > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
> > > next_speed = pcie_link_speed[lnksta & PCI_EXP_LNKSTA_CLS];
> > > @@ -6274,6 +6314,7 @@ u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
> > > *width = next_width;
> > > }
> > > +skip:
> > > dev = pci_upstream_bridge(dev);
> > > }