Re: [PATCH] crypto: gcm - fix cacheline sharing

From: Ard Biesheuvel
Date: Wed May 29 2019 - 18:20:10 EST


On Wed, 29 May 2019 at 22:27, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
>
> On Wed, May 29, 2019 at 08:10:56PM +0300, Iuliana Prodan wrote:
> > The generic GCM driver should ensure that whatever it passes into
> > scatterlists is safe for non-cache coherent DMA.
> > The issue was seen while running GCM on CAAM driver. But, since CAAM
> > does not support GHASH on i.MX6, only CTR skcipher part of the GCM is
> > offloaded.
> > The skcipher request received by CAAM has req->src pointing to
> > auth_tag[16] and req->iv pointing to iv[16]. Problem is that when
> > the iv is updated (crypto API requires skcipher implementations to
> > update the IV with the last ciphertext block) is written in iv[16],
> > which is on the same cacheline as auth_tag[16] that was previously
> > DMA mapped.
> > Solution is to use a pointer, aligned to cache line, instead of auth_tag
> > buffer, for encryption/decryption and then free it on completion.
> >
> > Link: https://lore.kernel.org/linux-crypto/20190208114459.5nixe76xmmkhur75@xxxxxxxxxxxxxxxxxxx/
> > Cc: <stable@xxxxxxxxxxxxxxx> # v4.19+
> > Fixes: adcbc688fe2f ("crypto: gcm - Convert to new AEAD interface")
> > Suggested-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> > Signed-off-by: Iuliana Prodan <iuliana.prodan@xxxxxxx>
> >
...
> So what about the other places that also pass an IV located next to the data,
> like crypto/ccm.c and crypto/adiantum.c? If we're actually going to make this a
> new API requirement, then we need to add a debugging option that makes the API
> detect this violation so that the other places can be fixed too.
>
> Also, doing a kmalloc() per requset is inefficient and very error-prone. In
> fact there are at least 3 bugs here: (1) not checking the return value, (2)
> incorrectly using GFP_KERNEL when it may be atomic context, and (3) not always
> freeing the memory. Why not use cacheline-aligned memory within the request
> context, so that a separate kmalloc() isn't needed?
>
> Also, did you consider whether there's any way to make the crypto API handle
> this automatically, so that all the individual users don't have to?
>

Reading back that old thread, it appears that the core issue is that
the IV is copied when the scatterlist is already mapped for DMA. This
means the cacheline covering the IV and the auth tag is dirty while
the non-coherent DMA transaction takes place, and given that we clean
rather than invalidate the start and end of DMA mappings if they are
not aligned to the cache writeback granule size, whatever sits in the
cacheline overwrites whatever the device wrote in there.

Iuliana, did you try pulling the IV copy forward? I.e.,

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index c0ece44f303b..11e91c0c9a96 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -1835,11 +1835,6 @@ static int skcipher_decrypt(struct skcipher_request *req)
u32 *desc;
int ret = 0;

- /* allocate extended descriptor */
- edesc = skcipher_edesc_alloc(req, DESC_JOB_IO_LEN * CAAM_CMD_SZ);
- if (IS_ERR(edesc))
- return PTR_ERR(edesc);
-
/*
* The crypto API expects us to set the IV (req->iv) to the last
* ciphertext block.
@@ -1848,6 +1843,11 @@ static int skcipher_decrypt(struct skcipher_request *req)
scatterwalk_map_and_copy(req->iv, req->src, req->cryptlen -
ivsize, ivsize, 0);

+ /* allocate extended descriptor */
+ edesc = skcipher_edesc_alloc(req, DESC_JOB_IO_LEN * CAAM_CMD_SZ);
+ if (IS_ERR(edesc))
+ return PTR_ERR(edesc);
+
/* Create and submit job descriptor*/
init_skcipher_job(req, edesc, false);
desc = edesc->hw_desc;

This should ensure that the cacheline is cleaned when the DMA mapping
is created.