Re: [PATCH] drivers/iommu: Ensure that the queue base address is successfully written during SMMU initialization.

From: ni.liqiang
Date: Wed Feb 21 2024 - 10:26:45 EST


>> The SMMU registers are accessed using Device-nGnRE attributes. It is
>> my understanding that, for Device-nGnRE, the Arm architecture requires
>> that writes to the same peripheral arrive at the endpoint in program
>> order.
>
> Yup, that's correct. The "nR" part means "non-Reordering", so something
> else is going on here.

Yes, the SMMU registers are accessed using Device-nGnRE attributes.

One additional point to note is: in cases where there is a failure writing to the CMDQ base address register, the testing environment was a multi-die, multi-socket server. This issue has not been observed on a single-die server. I apologize for omitting this information in my initial patch submission.

Over the past few days, I have referenced the kernel source code and ported the SMMU register initialization process. Through multiple stress tests, I have attempted to reproduce the CMDQ base address register write failure issue. The summarized results of my experiments are as follows:
1. When testing with one CPU core bound using taskset, the initialization process was executed 300,000 times without encountering the CMDQ base address register write failure issue. However, when not binding CPU using taskset, the issue was reproduced around 1,000 iterations into the test.
2. Without CPU binding, I inserted a memory barrier between accesses to the CMDQ_BASE register and CMDQEN register, similar to the modification made in the patch. After executing the initialization process 300,000 times, the CMDQ base address register write failure issue did not occur.

Based on these observations and joint analysis with CMN colleagues, we speculate that in the SMMU register initialization process, if the CPU core changes, and these CPUs are located on different dies, the underlying 4 CCG ports are utilized to perform die-to-die accesses. However, in our current strategy, these 4 CCG ports cannot guarantee ordering, resulting in the completion of CMDQEN writing before the completion of CMDQ base address writing.

>From the analysis above, it seems that modifying the die-to-die access strategy to achieve ordering of Device-nGnRE memory might be a better solution compared to adding a memory barrier?