Re: [PATCH URGENT crypto] crypto: arm64/chacha - correctly walk through blocks

From: Eric Biggers
Date: Wed Mar 18 2020 - 20:24:03 EST


Hi Jason,

On Wed, Mar 18, 2020 at 05:45:18PM -0600, Jason A. Donenfeld wrote:
> Prior, passing in chunks of 2, 3, or 4, followed by any additional
> chunks would result in the chacha state counter getting out of sync,
> resulting in incorrect encryption/decryption, which is a pretty nasty
> crypto vuln, dating back to 2018. WireGuard users never experienced this
> prior, because we have always, out of tree, used a different crypto
> library, until the recent Frankenzinc addition. This commit fixes the
> issue by advancing the pointers and state counter by the actual size
> processed.
>
> Fixes: f2ca1cbd0fb5 ("crypto: arm64/chacha - optimize for arbitrary length inputs")
> Reported-and-tested-by: Emil Renner Berthing <kernel@xxxxxxxx>
> Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx>
> Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx

Thanks for fixing this! We definitely should get this fix to Linus for 5.6.
But I don't think your description of this bug dating back to 2018 is accurate,
because this bug only affects the new library interface to ChaCha20 which was
added in v5.5. In the "regular" crypto API case, the "walksize" is set to
'5 * CHACHA_BLOCK_SIZE', and chacha_doneon() is guaranteed to be called with a
multiple of '5 * CHACHA_BLOCK_SIZE' except at the end. Thus the code worked
fine with the regular crypto API.

In fact we have fuzz tests for the regular crypto API which find bugs exactly
like these. For example, they try dividing the data up randomly into chunks.
It would be great if the new library interface had fuzz tests too.

> diff --git a/arch/arm64/crypto/chacha-neon-glue.c b/arch/arm64/crypto/chacha-neon-glue.c
> index c1f9660d104c..debb1de0d3dd 100644
> --- a/arch/arm64/crypto/chacha-neon-glue.c
> +++ b/arch/arm64/crypto/chacha-neon-glue.c
> @@ -55,10 +55,10 @@ static void chacha_doneon(u32 *state, u8 *dst, const u8 *src,
> break;
> }
> chacha_4block_xor_neon(state, dst, src, nrounds, l);
> - bytes -= CHACHA_BLOCK_SIZE * 5;
> - src += CHACHA_BLOCK_SIZE * 5;
> - dst += CHACHA_BLOCK_SIZE * 5;
> - state[12] += 5;
> + bytes -= l;
> + src += l;
> + dst += l;
> + state[12] += round_up(l, CHACHA_BLOCK_SIZE) / CHACHA_BLOCK_SIZE;

Use DIV_ROUND_UP(l, CHACHA_BLOCK_SIZE)?

- Eric