Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations

From: Ard Biesheuvel
Date: Fri Oct 06 2023 - 19:34:06 EST


On Fri, 6 Oct 2023 at 23:01, He-Jie Shih <bignose1007@xxxxxxxxx> wrote:
>
> On Oct 7, 2023, at 03:47, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> > On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
> >> On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@xxxxxxxxx> wrote:
> >> The OpenSSL PR is at [1].
> >> And we are from SiFive.
> >>
> >> -Jerry
> >>
> >> [1]
> >> https://github.com/openssl/openssl/pull/21923
> >
> > Hi Jerry, I'm wondering if you have an update on this? Do you need any help?
>
> We have specialized aes-cbc/ecb/ctr patch locally and pass the `testmgr` test
> cases. But the test patterns in `testmgr` are quite simple, I think it doesn't test the
> corner case(e.g. aes-xts with tail element).
>

There should be test cases for that.

> For aes-xts, I'm trying to update the implementation from OpenSSL. The design
> philosophy is different between OpenSSL and linux. In linux crypto, the data will
> be split into `scatterlist`. I need to preserve the aes-xts's iv for each scatterlist
> entry call.

Yes, this applies to all block ciphers that take an IV.

> And I'm thinking about how to handle the tail data in a simple way.

The RISC-V vector ISA is quite advanced, so there may be a better
trick using predicates etc but otherwise, I suppose you could reuse
the same trick that other asm implementations use, which is to use
unaligned loads and stores for the final blocks, and to use a vector
permute with a permute table to shift the bytes in the registers. But
this is not performance critical, given that existing in-kernel users
use sector or page size inputs only.

> By the way, the `xts(aes)` implementation for arm and x86 are using
> `cra_blocksize= AES_BLOCK_SIZE`. I don't know why we need to handle the tail
> element. I think we will hit `EINVAL` error in `skcipher_walk_next()` if the data size
> it not be a multiple of block size.
>

No, both XTS and CBC-CTS permit inputs that are not a multiple of the
block size, and will use some form of ciphertext stealing for the
final tail. There is a generic CTS template that wraps CBC but
combining them in the same way (e.g., using vector permute) will speed
up things substantially. *However*, I'm not sure how relevant CBC-CTS
is in the kernel, given that only fscrypt uses it IIRC but actually
prefers something else so for new systems perhaps you shouldn't
bother.

> Overall, we will have
> 1) aes cipher
> 2) aes with cbc/ecb/ctr/xts mode
> 3) sha256/512 for `vlen>=128` platform
> 4) sm3 for `vlen>=128` platform
> 5) sm4
> 6) ghash
> 7) `chacha20` stream cipher
>
> The vector crypto pr in OpenSSL is under reviewing, we are still updating the
> perl file into linux.
>
> The most complicated `gcm(aes)` mode will be in our next plan.
>
> > I'm also wondering about riscv.pm and the choice of generating the crypto
> > instructions from .words instead of using the assembler. It makes it
> > significantly harder to review the code, IMO. Can we depend on assembler
> > support for these instructions, or is that just not ready yet?
>
> I have asked the same question before[1]. The reason is that Openssl could use
> very old compiler for compiling. Thus, the assembler might not know the standard
> rvv 1.0[2] and other vector crypto[3] instructions. That's why we use opcode for all
> vector instructions. IMO, I would prefer to use opcode for `vector crypto` only. The
> gcc-12 and clang-14 are already supporting rvv 1.0. Actually, I just read the `perl`
> file instead of the actually generated opcode for OpenSSL pr reviewing. And it's
> not hard to read the perl code.
>

I understand the desire to reuse code, and OpenSSL already relies on
so-called perlasm for this, but I think this is not a great choice,
and I actually think this was a mistake for RISC-V. OpenSSL relies on
perlasm for things like emittting different function pro-/epilogues
depending on the calling convention (SysV versus MS on x86_64, for
instance), but RISC-V does not have that much variety, and already
supports the insn_r / insn_i pseudo instructions to emit arbitrary
opcodes while still supporting named registers as usual. [Maybe my
experience does not quite extrapolate to the vector ISA, but I managed
to implement scalar AES [0] using the insn_r and insn_i pseudo
instructions (which are generally provided by the assembler but Linux
has fallback CPP macros for them as well), and this results on much
more maintainable code IMO.[

We are using some of the OpenSSL perlasm in the kernel already (and
some of it was introduced by me) but I don't think we should blindly
reuse all of the RISC-V code if some of it can straight-forwardly be
written as normal .S files.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=riscv-scalar-aes