Re: [RFC PATCH] iommu: arm-smmu-nvidia: Add default domain type implementation op

From: Stanimir Varbanov
Date: Mon Jul 31 2023 - 11:33:03 EST


Hi Thierry,

On 7/11/23 18:55, Thierry Reding wrote:
> On Tue, Jul 11, 2023 at 01:58:34PM +0300, Stanimir Varbanov wrote:
>> Hi Thierry,
>>
>> Thank you for the comments!
>>
>> On 7/10/23 13:40, Thierry Reding wrote:
>>> On Mon, Jul 10, 2023 at 11:22:52AM +0300, Stanimir Varbanov wrote:
>>>> Add def_domain_type implementation op and override default IOMMU
>>>> domain Kconfig option (CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y), which
>>>> could be enabled on some distros. The current quirk has been done
>>>> for Tegra234 machine, because I found the issue on it. The issue
>>>> itself appears on USB host controller which cannot be initialized
>>>> without IOMMU translation. Something more, we proved that IOMMU
>>>> translation is needed for display and GPU drivers as well.
>>>>
>>>> I evaluated few possible options to solve that:
>>>>
>>>> a) select default IOMMU domain from .def_domain_type op
>>>> b) Unset CONFIG_IOMMU_DEFAULT_PASSTHROUGH=n
>>>> c) add iommu.passthrough=0 on the kernel cmdline
>>>> d) firmware - ACPI / DT
>>>>
>>>> a) This option is implemented in the proposed patch.
>>>>
>>>> b) Since that the community has agreed that pass-through is preferred
>>>> as a default IOMMU domain option because this will avoid performance
>>>> impacts on some of the platforms [1]. On the other side we have examples
>>>> where you cannot even install Linux distribution on a machine where the
>>>> storage media cannot be detected and the system just hangs.
>>>
>>> That's not how I read that thread. It sounds more to me like Will and
>>> Robin had ideas on how to improve the performance and were planning to
>>> address these issues. It doesn't exactly sound to me like there was
>>> concensus to make passthrough the default.
>>>
>>> Having said that, given that it's possible for distributions and users
>>> to set CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y, I think it would be useful in
>>> general to have a way of enforcing IOMMU translations if it's needed by
>>> the hardware.
>>
>> Exactly, the problem is that some platforms prefer passthrough to avoid
>> performance impacts but others cannot even boot the kernel (and thus
>> installation failure). Passing iommu.passthrough=0 should be an
>> administrator decision, balancing between security and performance.
>>
>> On the other hand the aforementioned mail thread gave some performance
>> numbers which might be are outdated having the improvements made in smmu
>> driver in mind. Unfortunately, I cannot confirm that the performance has
>> been improved during that time.
>>
>>>
>>> I'm not sure I fully understand the particular problems that you're
>>> seeing on Tegra234, though. I'm not aware of anything in the USB host
>>> controller driver (or hardware, for that matter) that would require the
>>> IOMMU to be enabled. The only peculiarity that I can think of is the
>>> firmware, which is typically loaded by an early bootloader and therefore
>>> might perhaps need the IOMMU to properly map this in the kernel.
>>> However, my understanding is that this firmware is loaded into special
>>> carveout regions which don't require remapping.
>>
>> On Jetson Orin AGX (R35.2.1) I see these errors:
>>
>> tegra-mc 2c00000.memory-controller: unknown: write @0x0000000000000080:
>> EMEM address decode error (EMEM decode error)
>>
>> tegra-xusb 3610000.usb: Error while assigning device slot ID
>> tegra-xusb 3610000.usb: Max number of devices this xHCI host supports is 36.
>> usb usb2-port3: couldn't allocate usb_device
>> tegra-mc 2c00000.memory-controller: unknown: write @0x0000000000000090:
>> EMEM address decode error (EMEM decode error)
>> tegra-xusb 3610000.usb: Error while assigning device slot ID
>> tegra-xusb 3610000.usb: Max number of devices this xHCI host supports is 36.
>> usb usb1-port3: couldn't allocate usb_device
>>
>> tegra-mc 2c00000.memory-controller: unknown: write @0x00000000000000a0:
>> EMEM address decode error (EMEM decode error)
>> tegra-xusb 3610000.usb: Error while assigning device slot ID
>> tegra-xusb 3610000.usb: Max number of devices this xHCI host supports is 36.
>> usb usb1-port4: couldn't allocate usb_device
>>
>>>
>>> However, passthrough is admittedly not something that we've thoroughly
>>> tested, so it's possible you're running into a use-case that I'm not
>>> aware of. In that case, could you provide a few more specifics (such as
>>> the DTB and .config) of your build configuration so that I can try and
>>> reproduce?
>>
>> To reproduce you have to add iommu.passthrough=1 on kernel cmdline. The
>> dtb is from Jetpack.
>
> I was able to reproduce this on Jetson Orin NX (the differences to AGX
> Orin should be negligible in this context), though I ended up patching
> the DTB to disable all SMMUs. What fixed it for me was to drop the
> dma-coherent property from the usb@3610000 node. Can you try that on
> your end to see if that works for you as well?
>

I can confirm that deleting dma-coherent property from usb@3610000 DT
node fixes the issue with USB host controller for me.

~Stan