Re: [RFC PATCH] drm/ttm: force cached mappings for system RAM on ARM

From: Ard Biesheuvel
Date: Thu Jan 17 2019 - 03:02:53 EST


On Thu, 17 Jan 2019 at 07:07, Benjamin Herrenschmidt
<benh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, 2019-01-16 at 08:47 +0100, Ard Biesheuvel wrote:
> > > As far as I know on x86 it doesn't, so when you have an un-cached page
> > > you can still access it with a snooping DMA read/write operation and
> > > don't cause trouble.
> > >
> >
> > I think it is the other way around. The question is, on an otherwise
> > cache coherent device, whether the NoSnoop attribute set by the GPU
> > propagates all the way to the bus so that it bypasses the caches.
>
> On powerpc it's ignored, all DMA accesses will be snooped. But that's
> fine regardless of whether the memory was mapped cachable or not, the
> snooper will simply not find anything if not. I *think* we only do
> cache inject if the line already exists in one of the caches.
>

Others should correct me if I am wrong, but arm64 SoCs often have L3
system caches, and I would expect inbound transactions with writeback
write-allocate (WBWA) attributes to allocate there.

> > On x86, we can tolerate if this is not the case, since uncached memory
> > accesses by the CPU snoop the caches as well.
> >
> > On other architectures, uncached accesses go straight to main memory,
> > so if the device wrote anything to the caches we won't see it.
>
> Well, on all powerpc implementations that I am aware of at least (dunno
> about ARM), they do, but we don't have a problem because I don't think
> the devices can/will write to the caches directly unless a
> corresponding line already exists (but I might be wrong, we need to
> double check all implementations which is tricky).
>
> I am not aware of any powerpc chip implementing NoSnoop.
>

Do you have any history on why this optimization is disabled for power
unless CONFIG_NOT_CACHE_COHERENT is set?

That also begs the question how any of this is supposed to work with
non-cache coherent DMA. The code appears to always assume cache
coherent, and provide non-cache coherent as an optimization if
dma_arch_can_wc_memory() returns true. So I wonder if that helper
should take a struct device pointer instead, and return true for
non-cache coherent devices.

> > So to use this optimization, you have to either be 100% sure that
> > NoSnoop is implemented correctly, or have a x86 CPU.
> >
> > > > The old hack of using non-cached mapping to avoid snoop cost in AGP and
> > > > others is just that ... an ugly and horrible hacks that should have
> > > > never eventuated, when the search for performance pushes HW people into
> > > > utter insanity :)
> > >
> > > Well I agree that un-cached system memory makes things much more
> > > complicated for a questionable gain.
> > >
> > > But fact is we now have to deal with the mess, so no point in
> > > complaining about it to much :)
> > >
> >
> > Indeed. I wonder if we should just disable it altogether unless CONFIG_X86=y
>
> The question is whether DMA from a device can instanciate cache lines
> in your system. This a system specific rather than architecture
> specific question I suspect...
>

The ARM architecture permits it, afaict, and write-allocate is a hint
so the implementation is free to ignore it, whether it is set or
cleared.