Re: [PATCH v3 1/1] PCI: Add translated request only flag for pci_enable_pasid()

From: Jonathan Cameron
Date: Wed Feb 01 2023 - 09:09:29 EST


On Tue, 31 Jan 2023 22:36:27 -0400
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Tue, Jan 31, 2023 at 06:14:19PM -0600, Bjorn Helgaas wrote:
>
> > > AMD GPU is one of those devices.
> >
> > I guess you mean the AMD GPU has ATS, PRI, and PASID Capabilities?
> > And furthermore, that the GPU *always* uses Translated addresses with
> > PASID?
>
> I'm not versed in the spec lingo, but the GPU issues MemRd/Wrs with
> the translated bit set and no PASID header - which is the correct form
> for an address that was translated by ATS.

FWIW there is a capability bit and enable bit in the PASID cap/control
registers that says whether a device can/should add a PASID to a
translated request or not. I think the intent is that a host can
sanity check AT requests to make sure the device isn't making them
up. To do that it needs the PASID. Not sure any hosts do this yet
though ;)

Not worth much, but I thought it always sent the PASID so dug out spec
to check (I was wrong as it is both optional and configurable).

>
> To get to that it issues ATS requests, and only the ATS related
> requests will carry the PASID.
>
> ATS related requests always route to the root port, which is why it is
> functionally equivalent to ACS RR/UF in these cases.
>
> Translated requests always route where they are supposed to go, even
> with P2P and things.
>
> > And this applies even if there is no ACS or ACS doesn't support
> > PCI_ACS_RR and PCI_ACS_UF.
> >
> > The black screen happens because ... ?
>
> AMD GPU driver bugs blow up if it cannot setup PASID.
>
> > I couldn't figure out the NULL pointer dereference. I expected it to
> > be from a BUG() or similar in report_iommu_fault(), but I don't see
> > that.
>
> IIRC it is a buggy error unwind handling in the AMD GPU driver.
>
> Jason