[PATCH 00/13] crypto: x86 - yield FPU context during long loops

From: Robert Elliott
Date: Mon Dec 19 2022 - 17:04:09 EST


This is an offshoot of the previous patch series at:
https://lore.kernel.org/linux-crypto/20221219202910.3063036-1-elliott@xxxxxxx

Add a kernel_fpu_yield() function for x86 crypto drivers to call
periodically during long loops.

Test results
============
I created 28 tcrypt modules so modprobe can run concurrent tests,
added 1 MiB functional and speed tests to tcrypt, and ran three processes
spawning 28 subprocesses (one per physical CPU core) each looping forever
through all the tcrypt test modes. This keeps the system quite busy,
generating RCU stalls and soft lockups during both generic and x86
crypto function processing.

In conjunction with these patch series:
* [PATCH 0/8] crypto: kernel-doc for assembly language
https://lore.kernel.org/linux-crypto/20221219185555.433233-1-elliott@xxxxxxx
* [PATCH 0/3] crypto/rcu: suppress unnecessary CPU stall warnings
https://lore.kernel.org/linux-crypto/20221219202910.3063036-1-elliott@xxxxxxx
* [PATCH 0/3] crypto: yield at end of operations
https://lore.kernel.org/linux-crypto/20221219203733.3063192-1-elliott@xxxxxxx

while using the default RCU values (60 s stalls, 21 s expedited stalls),
several nights of testing did not result in any RCU stall warnings or soft
lockups in any of these preemption modes:
preempt=none
preempt=voluntary
preempt=full

Setting the shortest possible RCU timeouts (3 s, 20 ms) did still result
in RCU stalls, but only about one every 2 hours, and not occurring
on particular modules like sha512_ssse3 and sm4-generic.

systemd usually crashes and restarts when its journal becomes full from
all the tcrypt printk messages. Without the patches, that triggered more
RCU stall reports and soft lockups; with the patches, only userspace
seems perturbed.


Robert Elliott (13):
x86: protect simd.h header file
x86: add yield FPU context utility function
crypto: x86/sha - yield FPU context during long loops
crypto: x86/crc - yield FPU context during long loops
crypto: x86/sm3 - yield FPU context during long loops
crypto: x86/ghash - use u8 rather than char
crypto: x86/ghash - restructure FPU context saving
crypto: x86/ghash - yield FPU context during long loops
crypto: x86/poly - yield FPU context only when needed
crypto: x86/aegis - yield FPU context during long loops
crypto: x86/blake - yield FPU context only when needed
crypto: x86/chacha - yield FPU context only when needed
crypto: x86/aria - yield FPU context only when needed

arch/x86/crypto/aegis128-aesni-glue.c | 49 ++++++---
arch/x86/crypto/aria_aesni_avx_glue.c | 7 +-
arch/x86/crypto/blake2s-glue.c | 41 +++----
arch/x86/crypto/chacha_glue.c | 22 ++--
arch/x86/crypto/crc32-pclmul_glue.c | 49 +++++----
arch/x86/crypto/crc32c-intel_glue.c | 118 ++++++++++++++------
arch/x86/crypto/crct10dif-pclmul_glue.c | 65 ++++++++---
arch/x86/crypto/ghash-clmulni-intel_asm.S | 6 +-
arch/x86/crypto/ghash-clmulni-intel_glue.c | 37 +++++--
arch/x86/crypto/nhpoly1305-avx2-glue.c | 22 ++--
arch/x86/crypto/nhpoly1305-sse2-glue.c | 22 ++--
arch/x86/crypto/poly1305_glue.c | 47 ++++----
arch/x86/crypto/polyval-clmulni_glue.c | 46 +++++---
arch/x86/crypto/sha1_avx2_x86_64_asm.S | 6 +-
arch/x86/crypto/sha1_ni_asm.S | 8 +-
arch/x86/crypto/sha1_ssse3_glue.c | 120 +++++++++++++++++----
arch/x86/crypto/sha256_ni_asm.S | 8 +-
arch/x86/crypto/sha256_ssse3_glue.c | 115 ++++++++++++++++----
arch/x86/crypto/sha512_ssse3_glue.c | 89 ++++++++++++---
arch/x86/crypto/sm3_avx_glue.c | 34 +++++-
arch/x86/include/asm/simd.h | 23 ++++
include/crypto/internal/blake2s.h | 8 +-
lib/crypto/blake2s-generic.c | 12 +--
23 files changed, 687 insertions(+), 267 deletions(-)

--
2.38.1