RE: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption

From: Elliott, Robert (Servers)
Date: Fri Dec 16 2022 - 17:12:39 EST



> I'll keep experimenting with all the preempt modes, heavier
> workloads, and shorter RCU timeouts to confirm this solution
> is robust. It might even be appropriate for the generic
> drivers, if they suffer from the problems that sm4 shows here.

I have a set of patches that's looking promising. It's no longer
generating RCU stall warnings or soft lockups with either x86
drivers or generic drivers (sm4 is particularly taxing).

Test case:
* added 28 clones of the tcrypt module so modprobe can run it
many times in parallel (1 thread per CPU core)
* added 1 MiB big buffer functional tests (compare to
generic results)
* added 1 MiB big buffer speed tests
* 3 windows running
* 28 threads running
* modprobe with each defined test mode in order 1, 2, 3, etc.
* RCU stall timeouts set to shortest supported values
* run in preempt=none, preempt=voluntary, preempt=full modes

Patches include:
* Ard's kmap_local() patch
* Suppress RCU stall warnings during speed tests. Change the
rcu_sysrq_start()/end() functions to be general purpose and
call them from tcrypt test functions that measure time of
a crypto operation
* add crypto_yield() unilaterally in skcipher_walk_done so
it is run even if data is aligned
* add crypto_yield() in aead_encrypt/decrypt so they always
call it like skcipher
* add crypto_yield() at the end each hash update(), digest(),
and finup() function so they always call it like skcipher
* add kernel_fpu_yield() calls every 4 KiB inside x86
kernel_fpu_begin()/end() blocks, so the x86 functions always
yield to the scheduler even when they're bypassing those
helper functions (that now call crypto_yield() more
consistently)

I'll keep trying to break it over the weekend. If it holds
up I'll post the patches next week.