Re: [PATCH] crypto: gcm - fix cacheline sharing

From: Horia Geanta
Date: Thu May 30 2019 - 09:22:16 EST


On 5/30/2019 11:08 AM, Ard Biesheuvel wrote:
> On Thu, 30 May 2019 at 09:46, Horia Geanta <horia.geanta@xxxxxxx> wrote:
>>
>> On 5/30/2019 8:34 AM, Herbert Xu wrote:
>>> On Wed, May 29, 2019 at 01:27:28PM -0700, Eric Biggers wrote:
>>>>
>>>> So what about the other places that also pass an IV located next to the data,
>>>> like crypto/ccm.c and crypto/adiantum.c? If we're actually going to make this a
>> Fix for ccm is WIP.
>> We were not aware of adiantum since our crypto engine does not accelerate it.
>>
>>>> new API requirement, then we need to add a debugging option that makes the API
>>>> detect this violation so that the other places can be fixed too.
>>>>
>> IMO this is not a new crypto API requirement.
>> crypto API and its users must follow DMA API rules, besides crypto-specific ones.
>>
>> In this particular case, crypto/gcm.c is both an implementation and a crypto API
>> user, since it uses underneath ctr(aes) (and ghash).
>> Currently generic gcm implementation is breaking DMA API, since part of the dst
>> buffer (auth_tag) provided to ctr(aes) is sharing a cache line with some other
>> data structure (iv).
>>
>> The DMA API rule is mentioned in Documentation/DMA-API.txt
>>
>> .. warning::
>>
>> Memory coherency operates at a granularity called the cache
>> line width. In order for memory mapped by this API to operate
>> correctly, the mapped region must begin exactly on a cache line
>> boundary and end exactly on one (to prevent two separately mapped
>> regions from sharing a single cache line).
>>
>>
>
> This is overly restrictive, and not in line with reality. The whole
> networking stack operates on buffers shifted by 2 bytes if
> NET_IP_ALIGN is left at its default value of 2. There are numerous
> examples in other places as well.
>
> Given that kmalloc() will take the cacheline granularity into account
> if necessary, the only way this issue can hit is when a single kmalloc
> buffer is written to by two different masters.
>
I guess there are only two options:
-either cache line sharing is avoided OR
-users need to be *aware* they are sharing the cache line and some rules /
assumptions are in place on how to safely work on the data

What you are probably saying is that 2nd option is sometimes the way to go.

>>>> Also, did you consider whether there's any way to make the crypto API handle
>>>> this automatically, so that all the individual users don't have to?
>> That would probably work, but I guess it would come up with a big overhead.
>>
>> I am thinking crypto API would have to check each buffer used by src, dst
>> scatterlists is correctly aligned - starting and ending on cache line boundaries.
>>
>>>
>>> You're absolutely right Eric.
>>>
>>> What I suggested in the old thread is non-sense. While you can
>>> force GCM to provide the right pointers you cannot force all the
>>> other crypto API users to do this.
>>>
>> Whose problem is that crypto API users don't follow the DMA API requirements?
>>
>>> It would appear that Ard's latest suggestion should fix the problem
>>> and is the correct approach.
>>>
>> I disagree.
>> We shouldn't force crypto implementations to be aware of such inconsistencies in
>> the I/O data buffers (pointed to by src/dst scatterlists) that are supposed to
>> be safely DMA mapped.
>>
>
> I'm on the fence here. On the one hand, it is slightly dodgy for the
> GCM driver to pass a scatterlist referencing a buffer that shares a
> cacheline with another buffer passed by an ordinary pointer, and for
> which an explicit requirement exists that the callee should update it
> before returning.
>
> On the other hand, I think it is reasonable to require drivers not to
> perform such updates while the scatterlist is mapped for DMA, since
> fixing it in the callers puts a disproportionate burden on them, given
> that non-coherent DMA only represents a small minority of use cases.
>
The problem with this approach is that the buffers in the scatterlist could
hypothetically share cache lines with *any* other CPU-updated data, not just the
IV in the crypto request (as it happens here).
How could a non-coherent DMA implementation cope with this?

Thanks,
Horia