Re: [PATCH] dma-contiguous: Prioritize restricted DMA pool over shared DMA pool

From: Florian Fainelli
Date: Wed Feb 16 2022 - 15:39:46 EST


On 2/16/22 11:48 AM, Robin Murphy wrote:
> On 2022-02-16 17:37, Florian Fainelli wrote:
>> On 2/16/22 3:13 AM, Robin Murphy wrote:
>>> On 2022-02-15 22:43, Florian Fainelli wrote:
>>>> Some platforms might define the same memory region to be suitable for
>>>> used by a kernel supporting CONFIG_DMA_RESTRICTED_POOL while
>>>> maintaining
>>>> compatibility with older kernels that do not support that. This is
>>>> achieved by declaring the node this way;
>>>
>>> Those platforms are doing something inexplicably wrong, then.
>>
>> Matter of perspective I guess.
>>
>>>
>>> "restricted-dma-pool" says that DMA for the device has to be bounced
>>> through a dedicated pool because it can't be trusted with visibility of
>>> regular OS memory. "reusable" tells the OS that it's safe to use the
>>> pool as regular OS memory while it's idle. Do you see how those concepts
>>> are fundamentally incompatible?
>>
>>  From the perspective of the software or firmware agent that is
>> responsible for setting up the appropriate protection on the reserved
>> memory, it does not matter what the compatible string is, the only
>> properties that matter are the base address, size, and possibly whether
>> 'no-map' is specified or not to set-up appropriate protection for the
>> various memory controller agents (CPU, PCIe, everything else).
>>
>> Everything else is just information provided to the OS in order to
>> provide a different implementation keyed off the compatible string. So
>> with that in mind, you can imagine that before the introduction of
>> 'restricted-dma-pool' in 5.15, some platforms already had such a concept
>> of a reserved DMA region, that was backed by a device private CMA pool,
>> they would allocate memory from that region and would create their own
>> middle layer for bounce buffering if they liked to. This is obviously
>> not ideal on a number of levels starting from not being done at the
>> appropriate level but it was done.
>>
>> Now that 'restricted-dma-pool' is supported, transitioning them over is
>> obviously better and updating the compatible string for those specific
>> regions to include the more descriptive 'restrictded-dma-pool' sounded
>> to me as an acceptable way to maintain forward/backward DTB
>> compatibility rather than doubly reserving these region one with the
>> "old" compatible and one with the "new" compatible, not that the system
>> is even capable of doing that anyway, so we would have had to
>> essentially make them adjacent.
>>
>> And no, we are not bringing Linux version awareness to our boot loader
>> mangling the Device Tree blob, that's not happening, hence this patch.
>
> If the patch was adding a "brcm,insecure-dma-pool" compatible and
> hooking it up, I'd be less bothered. As it is, I remain unconvinced that
> describing two things that are not interchangeable with each other as
> interchangeable with each other is in any way "better".

We most definitively should have done that but we did not because we
sort of like to maintain as fewer patches as possible against the
mainline kernel, believe it or not. Also, it would have been fun to
explain why we went with our own compatible string to obtain the same
semantics from the kernel as the generic one, but I will stick a pin in
that idea.

>
>>> Linux is right to reject contradictory information rather than attempt
>>> to guess at what might be safe or not.
>>
>> The piece of contradictory information here specifically is the
>> 'reusable' boolean property, but as I explain the commit message
>> message, if you have a "properly formed" 'restricted-dma-pool' region
>> then it should not have that property in the first place, but even if it
>> does, it does not harm anything to have it.
>>
>>>
>>> Furthermore, down at the practical level, a SWIOTLB pool is used for
>>> bouncing streaming DMA API mappings, while a coherent/CMA pool is used
>>> for coherent DMA API allocations; they are not functionally
>>> interchangeable either. If a device depends on coherent allocations
>>> rather than streaming DMA, it should still have a coherent pool even
>>> under a physical memory protection scheme, and if it needs both
>>> streaming DMA and coherent buffers then it can have both types of pool -
>>> the bindings explicitly call that out. It's true that SWIOTLB pools can
>>> act as an emergency fallback for small allocations for I/O-coherent
>>> devices, but that's not really intended to be relied upon heavily.
>>>
>>> So no, I do not see anything wrong with the current handling of
>>> nonsensical DTs here, sorry.
>>
>> There is nothing wrong in the current code, but with changes that have
>> no adverse effect on "properly" constructed reserved memory entries we
>> can accept both types of reservation and maintain forward/backward
>> compatibility in our case. So why not?
>
> Would you be happy to give me blanket permission to point a gun at your
> foot and pull the trigger at any point in the future, if right now I
> show you an unloaded gun?
>
> Security and lazy shortcuts do not mix well.

Security is not enforced by the kernel here but by a piece of hardware
external to the CPU, that is true of the platforms that prompted the
'restricted-dma-pool' work in the first place by Claire. The compatible
string is irrelevant to that entity. There is no question this is lazy,
that is the whole point actually. The kernel is just communicated a
reserved memory region, if it can support a given behavior and expose a
set of services to do that, great, we leverage that, else we fall back
to another kind that we can also use, albeit differently.

> You are literally arguing
> that mainline Linux should support a back-door ABI for illegal DT
> properties which at worst has the potential to defeat a generic security
> feature.

Not quite, I am arguing that in the event that someone thought it was
possible to maintain his/her own notion of backward and forward
compatibility we *could* be supporting this. and with minimal amounts of
code to the kernel.

The kernel is not a trusted entity in the security model revolving
around the 'restricted-dma-pool' anyway which is why it is told where
these reserved memory regions are defined, and does not decide where to
place them. If the aperture is not set-up properly there is no DMA
to/from that region period so the results are pretty clear: you do not
get your device to work. Whether the kernel thinks that region is a
device-private CMA pool or a device-private SWIOTLB pool is largely
irrelevant in the security model that is a kernel mechanism on how you
want to leverage the region.

> The "restricted-dma-pool" binding explicitly says "When using
> this, the no-map and reusable properties must not be set" (I should spin
> up a patch enforcing that in the schema...).
> No matter how convinced you
> are that no OS past present or future could possibly ever behave
> differently from the particular downstream software stack you care
> about, NAK to subverting the "restricted-dma-pool" compatible in any
> way, sorry.
> I for one wish to have no part in the next
> trendy-name-compromise down the line where a protected VM can be tricked
> into exposing its page cache to a "DMA attack" by an untrusted
> hypervisor because fixing Florian's bootloader is hard.

I suppose your concern is that if we accept my patch, one could build a
kernel with CONFIG_DMA_RESTRICTED_POOL turned off to obtain a device
private CMA pool instead of a truly restricted DMA pool, then the
hypervisor allocates from that area and put sensitive stuff in there,
but later on we let a malicious VM-owned PCIe VF or actual PF DMA from
that location and we exfiltrate out of that location?

That exists today however as soon as an un-trusted DTB can be passed to
that hypervisor, so what is new here that I am missing? If you use the
example I provide in the patch, and given how the code is structured and
the link order, you will get a device private CMA pool instead of a
restricted DMA pool because kernel/dma/contiguous.c will match that
entry before kernel/dma/swiotlb.c (if built) does. Similarly if I can
mangle the DTB passed to the kernel to substitute "restricted-dma-pool'
with 'shared-dma-pool' I would obtain a different behavior for the
kernel than originally intended. The point though is that there should
be adequate security enforced around the reserved memory region in a way
that is orthogonal to the kernel.

Anyway, I thought this might be acceptable, I would like to hear from
others just in case, but will not pursue this router anymore unless
someone tells me otherwise. Thank you, I guess.
--
Florian