Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"

From: Manivannan Sadhasivam
Date: Fri Dec 02 2022 - 05:55:16 EST


On Fri, Dec 02, 2022 at 10:03:58AM +0000, Will Deacon wrote:
> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote:
> > On 02.12.22 09:26, Amit Pundir wrote:
> > > On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> > >>
> > >> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote:
> > >>> Has any progress been made to fix this regression? It afaics is not a
> > >>> release critical issue, but well, it still would be nice to get this
> > >>> fixed before 6.1 is released.
> > >>
> > >> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit
> > >> that exposed the driver bug. It doesn't fix the actual bug, it only
> > >> makes it less likely to happen.
> > >>
> > >> I like the original commit removing the cache invalidation as it shows
> > >> drivers not behaving properly
> >
> > Yeah, I understand that, but I guess it's my job to ask at this point:
> > "is continuing to live with the old behavior for one or two more cycles"
> > that much of a problem"?
>
> That wouldn't be a problem. The problem is that I haven't see any efforts
> from the Qualcomm side to actually fix the drivers so what makes you think
> the issue will be addressed in one or two more cycles? On the other hand, if
> there were patches out there trying to fix the drivers and we just needed to
> revert this change to buy them some time, then that would obviously be the
> right thing to do.
>

There are efforts going on to fix the driver from Qualcomm. It's just that the
patches are not available yet. The delay is mainly due to the internal
communication that should happen between the internal teams.

The fix would be use a separate no-map carveout for the usecase.

But it'd be good to revert this patch untill those patches get merged.

Thanks,
Mani

> > >> but, as a workaround, we could add a
> > >> command line option to force back the old behaviour (defaulting to the
> > >> new one) until the driver is fixed.
> >
> > Well, sometimes that approach is fine to fix a regression, but I'm not
> > sure this is one of those situations, as this...
> >
> > > We use DB845c extensively for mainline and android-mainline[1] testing
> > > with AOSP, and it is broken for weeks now. So be it a temporary
> > > workaround or a proper driver fix in place, we'd really appreciate a
> > > quick fix here.
> >
> > ...doesn't sound like we are not talking about some odd corner case
> > here. But in the end that would be up to Linus to decide.
>
> The issue is that these drivers are abusing the DMA API to manage buffers
> which are being transferred to trustzone. Even with the revert, this is
> broken (the CPU can speculate from the kernel's cacheable linear mapping
> of memory), it just appears to be less likely with the CPUs on this SoC.
> So we end up in a situation where the kernel is flakey on these devices
> but with even less incentive for the drivers to be fixed.
>
> As well as broken drivers, the patch has also identified broken device-tree
> files where DMA-coherent devices weher incorrectly being treated as
> non-coherent:
>
> https://lore.kernel.org/linux-arm-kernel/20221124142501.29314-1-johan+linaro@xxxxxxxxxx/
>
> so I do think it's something that's worth having as the default behaviour.
>
> > I'll point him to this thread once more in my weekly report anyway.
> > Maybe I'll even suggest to revert this change, not sure yet.
>
> As I said above, I think the revert makes sense if the drivers are actually
> being fixed, but I'm not seeing any movement at all on that front.
>
> Will

--
மணிவண்ணன் சதாசிவம்