Re: [RFC PATCH] iommu: arm-smmu-nvidia: Add default domain type implementation op

From: Stanimir Varbanov
Date: Tue Jul 11 2023 - 06:58:50 EST


Hi Thierry,

Thank you for the comments!

On 7/10/23 13:40, Thierry Reding wrote:
> On Mon, Jul 10, 2023 at 11:22:52AM +0300, Stanimir Varbanov wrote:
>> Add def_domain_type implementation op and override default IOMMU
>> domain Kconfig option (CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y), which
>> could be enabled on some distros. The current quirk has been done
>> for Tegra234 machine, because I found the issue on it. The issue
>> itself appears on USB host controller which cannot be initialized
>> without IOMMU translation. Something more, we proved that IOMMU
>> translation is needed for display and GPU drivers as well.
>>
>> I evaluated few possible options to solve that:
>>
>> a) select default IOMMU domain from .def_domain_type op
>> b) Unset CONFIG_IOMMU_DEFAULT_PASSTHROUGH=n
>> c) add iommu.passthrough=0 on the kernel cmdline
>> d) firmware - ACPI / DT
>>
>> a) This option is implemented in the proposed patch.
>>
>> b) Since that the community has agreed that pass-through is preferred
>> as a default IOMMU domain option because this will avoid performance
>> impacts on some of the platforms [1]. On the other side we have examples
>> where you cannot even install Linux distribution on a machine where the
>> storage media cannot be detected and the system just hangs.
>
> That's not how I read that thread. It sounds more to me like Will and
> Robin had ideas on how to improve the performance and were planning to
> address these issues. It doesn't exactly sound to me like there was
> concensus to make passthrough the default.
>
> Having said that, given that it's possible for distributions and users
> to set CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y, I think it would be useful in
> general to have a way of enforcing IOMMU translations if it's needed by
> the hardware.

Exactly, the problem is that some platforms prefer passthrough to avoid
performance impacts but others cannot even boot the kernel (and thus
installation failure). Passing iommu.passthrough=0 should be an
administrator decision, balancing between security and performance.

On the other hand the aforementioned mail thread gave some performance
numbers which might be are outdated having the improvements made in smmu
driver in mind. Unfortunately, I cannot confirm that the performance has
been improved during that time.

>
> I'm not sure I fully understand the particular problems that you're
> seeing on Tegra234, though. I'm not aware of anything in the USB host
> controller driver (or hardware, for that matter) that would require the
> IOMMU to be enabled. The only peculiarity that I can think of is the
> firmware, which is typically loaded by an early bootloader and therefore
> might perhaps need the IOMMU to properly map this in the kernel.
> However, my understanding is that this firmware is loaded into special
> carveout regions which don't require remapping.

On Jetson Orin AGX (R35.2.1) I see these errors:

tegra-mc 2c00000.memory-controller: unknown: write @0x0000000000000080:
EMEM address decode error (EMEM decode error)

tegra-xusb 3610000.usb: Error while assigning device slot ID
tegra-xusb 3610000.usb: Max number of devices this xHCI host supports is 36.
usb usb2-port3: couldn't allocate usb_device
tegra-mc 2c00000.memory-controller: unknown: write @0x0000000000000090:
EMEM address decode error (EMEM decode error)
tegra-xusb 3610000.usb: Error while assigning device slot ID
tegra-xusb 3610000.usb: Max number of devices this xHCI host supports is 36.
usb usb1-port3: couldn't allocate usb_device

tegra-mc 2c00000.memory-controller: unknown: write @0x00000000000000a0:
EMEM address decode error (EMEM decode error)
tegra-xusb 3610000.usb: Error while assigning device slot ID
tegra-xusb 3610000.usb: Max number of devices this xHCI host supports is 36.
usb usb1-port4: couldn't allocate usb_device

>
> However, passthrough is admittedly not something that we've thoroughly
> tested, so it's possible you're running into a use-case that I'm not
> aware of. In that case, could you provide a few more specifics (such as
> the DTB and .config) of your build configuration so that I can try and
> reproduce?

To reproduce you have to add iommu.passthrough=1 on kernel cmdline. The
dtb is from Jetpack.

regards,
~Stan