Re: [PATCH v4] PCI: Relabel JHL6540 on Lenovo X1 Carbon 7,8

From: Esther Shimanovich
Date: Wed Jan 17 2024 - 16:21:43 EST


Thank you for all your comments! I really appreciate all your help
with this. I will address the style feedback once we reach a decision
on how we will fix this bug.
I first will respond to your comments, and then I will list out the
possible solutions to this bug, in a way that takes into account all
of your insights.

On Tue, Dec 26, 2023 at 7:15 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> Can you include a citation (spec name, revision, section) for this
> DMAR requirement?
>
This was my mistake–I misinterpreted what a firmware developer told
me. This is a firmware ACPI requirement from windows, which is not in
the DMAR spec. Windows uses it to identify externally exposed PCIE
root ports.
https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports

> But I don't see where the defect is here. And I doubt that this is
> really a unique situation. So it's likely that this will happen on
> other systems, and we don't want to have to add quirks every time
> another one shows up.
..
> don't have the new interface. But we at least need a plan that
> doesn't require quirks indefinitely.
..
On Thu, Dec 28, 2023 at 8:41 AM Mika Westerberg
<mika.westerberg@xxxxxxxxxxxxxxx> wrote:
> This is not scalable at all. You would need to include lots of systems
> here. And there should be no issue at all anyways.
My team tests hundreds of different devices, and this is the only one
which exhibited this issue that we’ve seen so far.
No other devices we’ve seen so far have a discrete internal
Thunderbolt controller which is treated as a removable device.
Therefore, we don’t expect that a large number of devices will need
this quirk.

> There is really nothing "unique" here. It's exactly as specified by
> this:
>
> https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports
>
> and being used in many many system already out there and those have been
> working just fine.
I don’t know how many computers have a discrete Thunderbolt chip that
is separate from their CPU, but this doesn’t seem to be a common
occurrence.
These devices were made during a narrow window of time when CPUs
didn’t have Thunderbolt features yet, so a separate JHL6540 chip had
to be added so that Lenovo could include Thunderbolt on X1 Carbon Gen
7/8.

As you said, these devices do indeed work fine in cases where you
don’t care if a PCI Thunderbolt device is internal or external, which
is most cases.
Problems happen only whenever someone adds a security policy, or some
other feature that cares about the distinction between a fixed or
removable PCI device.

> This has been working just fine so far and as far as I can tell there is
> no such "policy" in place in the mainline kernel.
Correct, there is no such policy in the mainline kernel as of now. The
bug is that the linux kernel’s “removable” property is inaccurate for
this device.

> Can you elaborate what the issue is and which mainline kernel you are
> using to reproduce this?
Thanks for this question! On a Lenovo Thinkpad Gen 7/Gen 8 computer
with the linux kernel installed, when you look at the properties of
the JHL6540 Thunderbolt controller, you see that it is incorrectly
labeled as removable. I have replicated this bug on the b85ea95d0864
Linux 6.7-rc1 kernel.

Before my patch, you see that the JHL6540 controller is inaccurately
labeled “removable”:
$ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
{removable} -e {device} -e {vendor} -e looking
looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
ATTR{device}=="0x15d3"
ATTR{removable}=="removable"
ATTR{vendor}=="0x8086"
looking at parent device '/devices/pci0000:00/0000:00:1d.4':
ATTRS{device}=="0x02b4"
ATTRS{vendor}=="0x8086"
looking at parent device '/devices/pci0000:00':

After applying the patch in this ticket, we see the JHL6540 controller
is now labeled as “fixed”:
$ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
{removable} -e {device} -e {vendor} -e looking
looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
ATTR{device}=="0x15d3"
ATTR{removable}=="fixed"
ATTR{vendor}=="0x8086"
looking at parent device '/devices/pci0000:00/0000:00:1d.4':
ATTRS{device}=="0x02b4"
ATTRS{vendor}=="0x8086"
looking at parent device '/devices/pci0000:00':

OK so here is the part where I share what I’ve developed as a result
of your comments:

The two options I see to resolve this are as follows:
1) Either we fix this by adding a new firmware interface as Bjorn
Helgaas brought up.
2) Alternatively we may address this through a cleaned-up version of this patch

If the solution is to add a firmware interface, how would I go about
that process? Could you put me in touch with someone with that
know-how?
Would we have a temporary software quirk in place while the firmware
spec is being updated?
I am deferring to your expertise and knowledge in solving this bug.
Thank you for all your help.