ASMedia ASM1062 (AHCI) hang after "ahci 0000:28:00.0: Using 64-bit DMA addresses"

From: Lennert Buytenhek
Date: Tue Jan 16 2024 - 07:36:53 EST


Hi,

On kernel 6.6.x, with an ASMedia ASM1062 (AHCI) controller, on an
ASUSTeK Pro WS WRX80E-SAGE SE WIFI mainboard, PCI ID 1b21:0612 and
subsystem ID 1043:858d, I got a total apparent controller hang,
rendering the two attached SATA devices unavailable, that was
immediately preceded by the following kernel messages:

[Thu Jan 4 23:12:54 2024] ahci 0000:28:00.0: Using 64-bit DMA addresses
[Thu Jan 4 23:12:54 2024] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00000 flags=0x0000]
[Thu Jan 4 23:12:54 2024] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00300 flags=0x0000]
[Thu Jan 4 23:12:54 2024] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00380 flags=0x0000]
[Thu Jan 4 23:12:54 2024] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00400 flags=0x0000]
[Thu Jan 4 23:12:54 2024] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00680 flags=0x0000]
[Thu Jan 4 23:12:54 2024] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00700 flags=0x0000]

It seems as if the controller has problems with 64-bit DMA addresses,
and the comments around the source of the message in
drivers/iommu/dma-iommu.c seem to point into that same direction:

/*
* Try to use all the 32-bit PCI addresses first. The original SAC vs.
* DAC reasoning loses relevance with PCIe, but enough hardware and
* firmware bugs are still lurking out there that it's safest not to
* venture into the 64-bit space until necessary.
*
* If your device goes wrong after seeing the notice then likely either
* its driver is not setting DMA masks accurately, the hardware has
* some inherent bug in handling >32-bit addresses, or not all the
* expected address bits are wired up between the device and the IOMMU.
*/
if (dma_limit > DMA_BIT_MASK(32) && dev->iommu->pci_32bit_workaround) {
iova = alloc_iova_fast(iovad, iova_len,
DMA_BIT_MASK(32) >> shift, false);
if (iova)
goto done;

dev->iommu->pci_32bit_workaround = false;
dev_notice(dev, "Using %d-bit DMA addresses\n", bits_per(dma_limit));
}

Are there any tests you can think of that I can run to further narrow
down this issue? By itself, the issue reproduces only rarely.

Thank you in advance.

Kind regards,
Lennert