Re: [PATCH] Revert "Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"""

From: Robin Murphy
Date: Fri Jun 16 2023 - 07:39:08 EST


On 2023-06-15 23:00, Doug Anderson wrote:
Hi,

On Thu, Jun 15, 2023 at 12:04 PM Robin Murphy <robin.murphy@xxxxxxx> wrote:

Presumably the fact that the firmware gets a physical address means
that the firmware needs to map this address somehow itself. I can try
to dig up what the firmware is doing if needed (what attributes it
uses to map), but I guess the hope is that it shouldn't matter.

It absolutely matters. Linux has been told (by DT) that this device does
not snoop caches, and therefore is acting on that information by using
the non-cacheable remap. There is nothing inherently wrong with that,
even when the "device" is actually firmware running on the same CPU -
EL3 could well run with the MMU off, or just make a point of not
accessing Non-Secure memory with cacheable attributes to avoid
side-channels. However if in this case the SCM firmware *is* using
cacheable attributes, as the symptoms would suggest, then either the
firmware or the DT is wrong, and there is nothing Linux can do to
completely fix that.

With help from minecrell on IRC, we've found that firmare _does_ map
it as cachable.


If this wild guessing is
correct, maybe a more correct solution would be to simply unmap the
memory from the kernel before passing the physical address to the
firmware, if that's possible...

Having now looked at the SCM driver, TBH it doesn't make an awful lot of
sense for it to be using dma_alloc_coherent() there anyway - it's not
using it as a coherent buffer, it's doing a one-off unidirectional
transfer of a relatively small amount of data in a manner which seems to
be exactly the usage model for the streaming DMA API. And I think using
the latter would happen to mitigate this problem too - with streaming
DMA you'd put the dma_map_page() call *after* all the buffer data has
been written, right before the SMC call, so even with a coherency
mismatch there would essentially be no opportunity for the caches to get
out of sync.

Switching to the streaming API for this function _does_ work, but for
now I'm not going to make this switch and instead going to go with the
fix to add "dma-coherent" [1]. That seems like the more correct fix
instead of just a mitigation.

Sure, if you're confident that it will *always* use cacheable attributes for any shared-memory "DMA", that makes sense.

If someone wants to switch the SCM
driver to the streaming APIs, that'd be cool too. On IRC, minecrell
pointed out that at least one function in this file,
qcom_scm_ice_set_key(), purposely didn't use the streaming APIs.

The joke there being that memzero_explicit() on the non-cacheable alias means a cached copy of the key data could still be visible via the linear map address after the buffer is freed - especially if one can rely on TF-A having just pulled the whole lot into caches by reading it through a read-write-allocate MT_MEMORY mapping. Security is hard :)

Thanks,
Robin.


[1] https://lore.kernel.org/r/20230615145253.1.Ic62daa649b47b656b313551d646c4de9a7da4bd4@changeid

-Doug