Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"

From: Will Deacon
Date: Fri Dec 02 2022 - 05:04:09 EST


On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote:
> On 02.12.22 09:26, Amit Pundir wrote:
> > On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> >>
> >> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
> >>> Has any progress been made to fix this regression? It afaics is not a
> >>> release critical issue, but well, it still would be nice to get this
> >>> fixed before 6.1 is released.
> >>
> >> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
> >> that exposed the driver bug. It doesn't fix the actual bug, it only
> >> makes it less likely to happen.
> >>
> >> I like the original commit removing the cache invalidation as it shows
> >> drivers not behaving properly
>
> Yeah, I understand that, but I guess it's my job to ask at this point:
> "is continuing to live with the old behavior for one or two more cycles"
> that much of a problem"?

That wouldn't be a problem. The problem is that I haven't see any efforts
from the Qualcomm side to actually fix the drivers so what makes you think
the issue will be addressed in one or two more cycles? On the other hand, if
there were patches out there trying to fix the drivers and we just needed to
revert this change to buy them some time, then that would obviously be the
right thing to do.

> >> but, as a workaround, we could add a
> >> command line option to force back the old behaviour (defaulting to the
> >> new one) until the driver is fixed.
>
> Well, sometimes that approach is fine to fix a regression, but I'm not
> sure this is one of those situations, as this...
>
> > We use DB845c extensively for mainline and android-mainline[1] testing
> > with AOSP, and it is broken for weeks now. So be it a temporary
> > workaround or a proper driver fix in place, we'd really appreciate a
> > quick fix here.
>
> ...doesn't sound like we are not talking about some odd corner case
> here. But in the end that would be up to Linus to decide.

The issue is that these drivers are abusing the DMA API to manage buffers
which are being transferred to trustzone. Even with the revert, this is
broken (the CPU can speculate from the kernel's cacheable linear mapping
of memory), it just appears to be less likely with the CPUs on this SoC.
So we end up in a situation where the kernel is flakey on these devices
but with even less incentive for the drivers to be fixed.

As well as broken drivers, the patch has also identified broken device-tree
files where DMA-coherent devices weher incorrectly being treated as
non-coherent:

https://lore.kernel.org/linux-arm-kernel/20221124142501.29314-1-johan+linaro@xxxxxxxxxx/

so I do think it's something that's worth having as the default behaviour.

> I'll point him to this thread once more in my weekly report anyway.
> Maybe I'll even suggest to revert this change, not sure yet.

As I said above, I think the revert makes sense if the drivers are actually
being fixed, but I'm not seeing any movement at all on that front.

Will