Re: ASMedia ASM1062 (AHCI) hang after "ahci 0000:28:00.0: Using 64-bit DMA addresses"

From: Robin Murphy
Date: Wed Jan 24 2024 - 11:15:25 EST


On 24/01/2024 1:58 pm, Lennert Buytenhek wrote:
On Wed, Jan 24, 2024 at 02:40:51PM +0200, Lennert Buytenhek wrote:

There are two ways to handle this -- either set the DMA mask for ASM106x
parts to 43 bits, or take the lazy route and just use AHCI_HFLAG_32BIT_ONLY
for these parts. I feel that the former would be more appropriate, as
there seem to be plenty of bits beyond bit 31 that do work, but I will
defer to your judgement on this matter. What do you think the right way
to handle this apparent hardware quirk is?

I've seen something similar for NVMe, where some NVMe controllers from
Amazon was violating the spec, and only supported 48-bit DMA addresses,
even though NVMe spec requires you to support 64-bit DMA addresses, see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4bdf260362b3be529d170b04662638fd6dc52241

It is possible that ASMedia ASM1061 has a similar problem (but for AHCI)
and only supports 43-bit DMA addresses, even though it sets AHCI CAP.S64A,
which says "Indicates whether the HBA can access 64-bit data structures.".

I think the best thing is to do a similar quirk, where we set the dma_mask
accordingly.

I'll give that a try.

I've sent out a patch that appears (from printk debugging) to do the
right thing, but I haven't validated that that patch fixes the original
issue, as the original issue is not trivial to trigger, and the hardware
that it triggered on is currently unavailable.

The missing piece of the puzzle is that *something* has to use up all the available 32-bit IOVA space to make you spill over into the 64-bit space to begin with. It can happen just from having many large buffers mapped simultaneously (particularly if there are several devices in the same IOMMU group), or it could be that something is leaking DMA mappings over time.

An easy way to confirm the device behaviour should be to boot with "iommu.forcedac=1", then all devices will have their full DMA mask exercised straight away.

Cheers,
Robin.

I've also made the quirk apply to all ASMedia ASM106x parts, because I
expect them to be affected by the same issue, but let's see what the
ASMedia folks have to say about that.

Thanks for your help!


Kind regards,
Lennert