Re: [PATCH] thunderbolt: Stop using iommu_present()

From: Robin Murphy
Date: Wed Mar 16 2022 - 15:18:10 EST


On 2022-03-16 18:34, Limonciello, Mario wrote:
[Public]

Can the USB4 CM make the device links in the DVSEC case perhaps too? I
would
think we want that anyway to control device suspend ordering.

If I had something discrete to try I'd dust off the DVSEC patch I wrote
before to
try it, but alas all I have is integrated stuff on my hand.

Mika, you might not have seen it yet, but I sent a follow up diff in this
thread
to Robin's patch. If that looks good Robin can submit a v2 (or I'm happy
to
do
so as well as I confirmed it helps my original intent too).

I saw it now and I'm thinking are we making this unnecessary complex? I
mean Microsoft solely depends on the DMAR platform opt-in flag:




I think Microsoft doesn't allow you to turn off the IOMMU though or put it
in
passthrough through on the kernel command line.

We also do turn on full IOMMU mappings in that case for devices that
are
marked as external facing by the same firmware that provided the
DMAR
bit. If the user decides to disable IOMMU from command line for
instance
then we expect she knows what she is doing.

Yeah, if external_facing is set correctly then we can safely expect the
the IOMMU layer to do the right thing, so in that case it probably is OK
to infer that if an IOMMU is present for the NHI then it'll be managing
that whole bus hierarchy. What I'm really thinking about here is whether
we can defend against a case when external_facing *isn't* set, so we
treat the tunnelled ports as normal PCI buses, assume it's OK since
we've got an IOMMU and everything else is getting translation domains
by
default, but then a Thunderbolt device shows up masquerading the
VID:DID
of something that gets a passthrough quirk, and thus tricks its way
through the perceived protection.

Robin.

Unless it happened after 5.17-rc8 looking at the code I think that's Intel
specific behavior though at the moment (has_external_pci). I don't see it
in a generic layer.

Ah, it's not necessarily the most obvious thing -
pci_dev->external_facing gets propagated through to pci_dev->untrusted
by set_pcie_untrusted(), and it's that that's then checked by
iommu_get_def_domain_type() to enforce a translation domain regardless
of default passthrough or quirks. It's then further checked by
iommu-dma's dev_is_untrusted() to enforce bounce-buffering to avoid data
leakage in sub-page mappings too.


Ah thanks for explaining it, that was immediately obvious to me.

In addition to the point Robin said about firmware not setting external
facing
if the IOMMU was disabled on command line then iommu_dma_protection
would be showing the wrong values meaning userspace may choose to
authorize the device automatically in a potentially unsafe scenario.

Even if the user "knew what they were doing", I would expect that we still
do our best to protect them from themselves and not advertise something
that will cause automatic authorization.

Might it be reasonable for the Thunderbolt core to check early on if any
tunnelled ports are not marked as external facing, and if so just tell
the user that iommu_dma_protection is off the table and anything they
authorise is at their own risk?

Robin.

How about in iommu_dma_protection_show to just check that all the device
links to the NHI are marked as untrusted?

Then if there are device links missing we solve that separately (discrete USB4
DVSEC case we just need to make those device links).

The feeling I'm getting from all this is that if we've got as far as iommu_dma_protection_show() then it's really too late to meaningfully mitigate bad firmware. We should be able to detect missing untrusted/external-facing properties as early as nhi_probe(), and if we could go into "continue at your own risk" mode right then *before* anything else happens, it all becomes a lot easier to reason about. If there's a strong enough impetus from Microsoft for system vendors to get their firmware right, hopefully we can get away with not trying too hard to cope with systems that haven't.

I'm inclined to send v2 of this patch effectively going back to my original (even simpler) cleanup, just now with much more reasoning about why it isn't doing more :)

Cheers,
Robin.