Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations

From: Eric Biggers
Date: Sat Nov 11 2023 - 12:52:30 EST


On Sat, Nov 11, 2023 at 09:08:31PM +1000, Ard Biesheuvel wrote:
> On Fri, 10 Nov 2023 at 15:44, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> >
> > On Fri, Nov 10, 2023 at 12:58:12PM +0800, Andy Chiu wrote:
> > > Hi Eric,
> > >
> > > On Thu, Nov 9, 2023 at 3:16 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Nov 07, 2023 at 04:53:13PM +0800, Jerry Shih wrote:
> > > > > On Nov 2, 2023, at 13:16, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> > > > > > On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
> > > > > >> +static int ecb_encrypt(struct skcipher_request *req)
> > > > > >> +{
> > > > > >> + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> > > > > >> + const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
> > > > > >> + struct skcipher_walk walk;
> > > > > >> + unsigned int nbytes;
> > > > > >> + int err;
> > > > > >> +
> > > > > >> + /* If we have error here, the `nbytes` will be zero. */
> > > > > >> + err = skcipher_walk_virt(&walk, req, false);
> > > > > >> + while ((nbytes = walk.nbytes)) {
> > > > > >> + kernel_vector_begin();
> > > > > >> + rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
> > > > > >> + nbytes & AES_BLOCK_VALID_SIZE_MASK,
> > > > > >> + &ctx->key);
> > > > > >> + kernel_vector_end();
> > > > > >> + err = skcipher_walk_done(
> > > > > >> + &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
> > > > > >> + }
> > > > > >> +
> > > > > >> + return err;
> > > > > >> +}
> > > > > >
> > > > > > There's no fallback for !crypto_simd_usable() here. I really like it this way.
> > > > > > However, for it to work (for skciphers and aeads), RISC-V needs to allow the
> > > > > > vector registers to be used in softirq context. Is that already the case?
> > > > >
> > > > > The kernel-mode-vector could be enabled in softirq, but we don't have nesting
> > > > > vector contexts. Will we have the case that kernel needs to jump to softirq for
> > > > > encryptions during the regular crypto function? If yes, we need to have fallbacks
> > > > > for all algorithms.
> > > >
> > > > Are you asking what happens if a softirq is taken while the CPU is between
> > > > kernel_vector_begin() and kernel_vector_end()? I think that needs to be
> > > > prevented by making kernel_vector_begin() and kernel_vector_end() disable and
> > > > re-enable softirqs, like what kernel_neon_begin() and kernel_neon_end() do on
> > > > arm64. Refer to commit 13150149aa6ded which implemented that behavior on arm64.
> > >
> > > Yes, if making Vector available to softirq context is a must, then it
> > > is reasonable to call local_bh_disable() in kernel_vector_begin().
> > > However, softirq would not be the only user for Vector and disabling
> > > it may cause extra latencies. Meanwhile, simply disabling bh in
> > > kernel_vector_begin() will conflict with the patch[1] that takes an
> > > approach to run Preemptible Vector. Though it is not clear yet on
> > > whether we should run Vector without turning off preemption, I have
> > > tested running preemptible Vector and observed some latency
> > > improvements without sacrificing throughput. We will have a discussion
> > > on LPC2023[2] and it'd be great if you could join or continue to
> > > discuss it here.
> > >
> > > Approaches can be done such as nesting, if running Vector in softirq
> > > is required. Since it requires extra save/restore on nesting, I think
> > > we should run some tests to get more performance (latency/throughput)
> > > figure let the result decide the final direction. For example, we
> > > could run Vector in either nesting with preempt-V and non-nesting
> > > without preempt-V and compare the following performance catachristics:
> > > - System-wide latency impact
> > > - Latency and throughput of softirq-Vector itself
> >
> > The skcipher and aead APIs do indeed need to work in softirq context.
> >
> > It's possible to use a fallback, either by falling back to scalar instructions
> > or by punting the encryption/decryption operation to a workqueue using
> > crypto/simd.c. However, both approaches have some significant disadvantages.
> > It was nice that the need for them on arm64 was eliminated by commit
> > 13150149aa6ded. Note that it's possible to yield the vector unit occasionally,
> > to keep preemption and softirqs from being disabled for too long.
> >
>
> It is also quite feasible to start out with an implementation of
> kernel_vector_begin() that preserves all vector registers eagerly in a
> special per-CPU allocation if the call is made in softirq context (and
> BUG when called in hardirq/NMI context). This was my initial approach
> on arm64 too.
>
> Assuming that RiSC-V systems with vector units are not flooding the
> market just yet, this gives you some time to study the issue without
> the need to implement non-vector fallback crypto algorithms
> everywhere.

Yes, that solution would be fine too.

- Eric