Re: [PATCH v3 1/1] PCI: Add translated request only flag for pci_enable_pasid()

From: Jason Gunthorpe
Date: Fri Feb 03 2023 - 13:52:43 EST


On Fri, Feb 03, 2023 at 12:20:45PM -0600, Bjorn Helgaas wrote:
> > The PCI rules are a bit complicated:
> > - A simple MemRd/Wr with a PASID will be routed according to the
> > address. This can be mis-routed
> > - A ATS translation request with a PASID is always routed to the host
> > bridge
>
> From a PCIe point of view, I think these cases are equivalent because
> a PASID prefix doesn't affect routing (sec 2.2.10.4). A Translation
> Request includes an Untranslated Address, and if that happens to look
> like a PCI bus address, I think it will be mis-routed just like a
> MemRd/Wr would be.

So, I trusted AMD when they said this is why their platform worked, I
didn't check carefully the ATS spec language.

But looking now, I agree. I don't see any language that says ATS is
routed differently, if anything it says it is routed the same way as
MemRd.

AMD folks?

If yes then this has all been the wrong direction.

> > > Do IOMMUs allocate (PASID, Untranslated Addresses) that look like
> > > PCI bus addresses?
> >
> > Yes, because it is mapped to a mm_struct userspace can use any mmap
> > to access any valid address as an IOVA and thus PASID tagged
> > translation must never become confused with bus addresses.
>
> If PCI bus addresses are carved out of the Translated Address arena,
> the MemRd/Wr TLPs should be fine,

Yes, that is my point that Translated requests are fine because they
already carry a Translated address, and by its nature a Translated
address cannot be mis-routed.

The spec actually talks about this whole issue see the NOTE at the end
of 10.1.1

Per that note, for Linux, the "implementation constraint on the TA" is
that the TA requires ACS so that Untranslated always routes to the TA.

The AMD defect is some BIOS's on some of their SOC's don't report ACS
for the MFD, even though it does presumably implement ACS.

> Are we on track to fix this before v6.2?

AMD? Did the patches you were talking about to fix the oops in the AMD
driver land? Can we rely on that to fix the black screen for v6.2?

Otherwise I think the PCI core is correct here and I'm loath to change
it to work around a GPU driver bug. The GPU driver should understand
that PASID is not supported because the PCI core says so. It shouldn't
crash and blackscreen.

Keep in mind, from a PCI perspective, this protection is a security
senstive check, described in the spec. If we drop it we expose
platforms using SVA to a potential security issue upon
mis-configuration.

So, I'd much rather leave the PCI alone and see that the AMD driver is
fixed.

Jason