Re: [PATCH v4] PCI: Relabel JHL6540 on Lenovo X1 Carbon 7,8

From: Mika Westerberg
Date: Thu Jan 18 2024 - 01:01:51 EST


On Wed, Jan 17, 2024 at 04:21:18PM -0500, Esther Shimanovich wrote:
> Thank you for all your comments! I really appreciate all your help
> with this. I will address the style feedback once we reach a decision
> on how we will fix this bug.
> I first will respond to your comments, and then I will list out the
> possible solutions to this bug, in a way that takes into account all
> of your insights.
>
> On Tue, Dec 26, 2023 at 7:15 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > Can you include a citation (spec name, revision, section) for this
> > DMAR requirement?
> >
> This was my mistake–I misinterpreted what a firmware developer told
> me. This is a firmware ACPI requirement from windows, which is not in
> the DMAR spec. Windows uses it to identify externally exposed PCIE
> root ports.
> https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports
>
> > But I don't see where the defect is here. And I doubt that this is
> > really a unique situation. So it's likely that this will happen on
> > other systems, and we don't want to have to add quirks every time
> > another one shows up.
> ...
> > don't have the new interface. But we at least need a plan that
> > doesn't require quirks indefinitely.
> ...
> On Thu, Dec 28, 2023 at 8:41 AM Mika Westerberg
> <mika.westerberg@xxxxxxxxxxxxxxx> wrote:
> > This is not scalable at all. You would need to include lots of systems
> > here. And there should be no issue at all anyways.
> My team tests hundreds of different devices, and this is the only one
> which exhibited this issue that we’ve seen so far.
> No other devices we’ve seen so far have a discrete internal
> Thunderbolt controller which is treated as a removable device.
> Therefore, we don’t expect that a large number of devices will need
> this quirk.

Well that's pretty much all Intel Titan Ridge and Maple Ridge based
systems. Some early ones did not use IOMMU but all the rest do.

> > There is really nothing "unique" here. It's exactly as specified by
> > this:
> >
> > https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports
> >
> > and being used in many many system already out there and those have been
> > working just fine.
> I don’t know how many computers have a discrete Thunderbolt chip that
> is separate from their CPU, but this doesn’t seem to be a common
> occurrence.
> These devices were made during a narrow window of time when CPUs
> didn’t have Thunderbolt features yet, so a separate JHL6540 chip had
> to be added so that Lenovo could include Thunderbolt on X1 Carbon Gen
> 7/8.

Before Intel Ice Lake it was all discrete and it is still discrete with
the Barlow Ridge controller who will have exact same ExternalFacing port
as the previous models.

> As you said, these devices do indeed work fine in cases where you
> don’t care if a PCI Thunderbolt device is internal or external, which
> is most cases.
> Problems happen only whenever someone adds a security policy, or some
> other feature that cares about the distinction between a fixed or
> removable PCI device.

Do we have such security policy in the mainline kernel?

> > This has been working just fine so far and as far as I can tell there is
> > no such "policy" in place in the mainline kernel.
> Correct, there is no such policy in the mainline kernel as of now. The
> bug is that the linux kernel’s “removable” property is inaccurate for
> this device.

Or perhaps the "policy" should know this better? IIRC there were some
"exceptions" in the Chrome kernel that allowed to put these devices into
"allowlist" is this not the case anymore?

> > Can you elaborate what the issue is and which mainline kernel you are
> > using to reproduce this?
> Thanks for this question! On a Lenovo Thinkpad Gen 7/Gen 8 computer
> with the linux kernel installed, when you look at the properties of
> the JHL6540 Thunderbolt controller, you see that it is incorrectly
> labeled as removable. I have replicated this bug on the b85ea95d0864
> Linux 6.7-rc1 kernel.
>
> Before my patch, you see that the JHL6540 controller is inaccurately
> labeled “removable”:
> $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
> {removable} -e {device} -e {vendor} -e looking
> looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
> ATTR{device}=="0x15d3"
> ATTR{removable}=="removable"
> ATTR{vendor}=="0x8086"

This is actually accurate. The Thunderbolt controller is itself
hot-removable and that BTW happens to be hot-removed when fwupd applies
firmware upgrades to the device.