Re: [PATCH v2] iommu/dma: Add config for PCI SAC address trick

From: Robin Murphy
Date: Fri Jun 24 2022 - 10:54:15 EST


On 2022-06-24 14:28, Joerg Roedel wrote:
On Thu, Jun 23, 2022 at 12:41:00PM +0100, Robin Murphy wrote:
On 2022-06-23 12:33, Joerg Roedel wrote:
On Wed, Jun 22, 2022 at 02:12:39PM +0100, Robin Murphy wrote:
Thanks for your bravery!

It already starts, with that patch I am getting:

xhci_hcd 0000:02:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0xff00ffffffefe000 flags=0x0000]

In my kernel log. The device is an AMD XHCI controller and seems to
funciton normally after boot. The message disappears with
iommu.forcedac=0.

Need to look more into that...

Given how amd_iommu_domain_alloc() sets the domain aperture, presumably the
DMA address allocated was 0xffffffffffefe000? Odd that it gets bits punched
out in the middle rather than simply truncated off the top as I would have
expected :/

So even more weird, as a workaround I changed the AMD IOMMU driver to
allocate a 4-level page-table and limit the DMA aperture to 48 bits. I
still get the same message.

Hmm, in that case my best guess would be that somewhere between the device itself and the IOMMU input it's trying to sign-extend the address from bit 47 or lower, but for whatever reason bits 55:48 get lost.

Comparing the PCI xHCI I have to hand, mine (with nothing plugged in) only has 6 pages mapped for its command ring and other stuff. Thus unless it's sharing that domain with other devices, to be accessing something down in the second MB of IOVA space suggests that this probably isn't the very first access it's made, and therefore it would almost certainly have to be the endpoint emitting a corrupted address, but only for certain operations.

FWIW I'd be inclined to turn on DMA debug and call debug_dma_dump_mappings() from the IOMMU fault handler, and/or add a bit of tracing to all the DMA mapping/allocation sites in the xHCI driver, to see what the offending address most likely represents.

Robin.