Re: [PATCH] drivers/iommu: Ensure that the queue base address is successfully written during SMMU initialization.

From: Will Deacon
Date: Wed Feb 21 2024 - 11:09:06 EST


On Wed, Feb 21, 2024 at 11:26:29PM +0800, ni.liqiang wrote:
> >> The SMMU registers are accessed using Device-nGnRE attributes. It is
> >> my understanding that, for Device-nGnRE, the Arm architecture requires
> >> that writes to the same peripheral arrive at the endpoint in program
> >> order.
> >
> > Yup, that's correct. The "nR" part means "non-Reordering", so something
> > else is going on here.
>
> Yes, the SMMU registers are accessed using Device-nGnRE attributes.
>
> One additional point to note is: in cases where there is a failure writing
> to the CMDQ base address register, the testing environment was a
> multi-die, multi-socket server. This issue has not been observed on a
> single-die server. I apologize for omitting this information in my initial
> patch submission.

Uh-oh, smells like a hardware issue ;p
I wonder if Device-nGnRnE behaves any differently?

> Over the past few days, I have referenced the kernel source code and
> ported the SMMU register initialization process. Through multiple stress
> tests, I have attempted to reproduce the CMDQ base address register write
> failure issue. The summarized results of my experiments are as follows:
> 1. When testing with one CPU core bound using taskset, the initialization
> process was executed 300,000 times without encountering the CMDQ base
> address register write failure issue. However, when not binding CPU using
> taskset, the issue was reproduced around 1,000 iterations into the test.
> 2. Without CPU binding, I inserted a memory barrier between accesses to
> the CMDQ_BASE register and CMDQEN register, similar to the modification
> made in the patch. After executing the initialization process 300,000
> times, the CMDQ base address register write failure issue did not occur.
>
> Based on these observations and joint analysis with CMN colleagues, we
> speculate that in the SMMU register initialization process, if the CPU
> core changes, and these CPUs are located on different dies, the underlying
> 4 CCG ports are utilized to perform die-to-die accesses. However, in our
> current strategy, these 4 CCG ports cannot guarantee ordering, resulting
> in the completion of CMDQEN writing before the completion of CMDQ base
> address writing.

(Disclaimer: I don't know what a CCG port is)

Hmmm. The part that doesn't make sense to me here is that migrating between
CPUs implies context-switching, and we have a DSB on that path in
__switch_to(). So why would adding barriers to the driver help? Maybe it
just changes the timing?

> From the analysis above, it seems that modifying the die-to-die access
> strategy to achieve ordering of Device-nGnRE memory might be a better
> solution compared to adding a memory barrier?

I'm not sure what you're proposing, but I don't think Linux should be
changed to accomodate this.

Will