Re: [PATCH 5/6] crypto: x86/sm3 - add AVX assembly implementation

From: Jussi Kivilinna
Date: Mon Dec 20 2021 - 13:04:04 EST


On 20.12.2021 10.22, Tianjia Zhang wrote:
This patch adds AVX assembly accelerated implementation of SM3 secure
hash algorithm. From the benchmark data, compared to pure software
implementation sm3-generic, the performance increase is up to 38%.

The main algorithm implementation based on SM3 AES/BMI2 accelerated
work by libgcrypt at:
https://gnupg.org/software/libgcrypt/index.html

Benchmark on Intel i5-6200U 2.30GHz, performance data of two
implementations, pure software sm3-generic and sm3-avx acceleration.
The data comes from the 326 mode and 422 mode of tcrypt. The abscissas
are different lengths of per update. The data is tabulated and the
unit is Mb/s:

update-size | 16 64 256 1024 2048 4096 8192
--------------------------------------------------------------------
sm3-generic | 105.97 129.60 182.12 189.62 188.06 193.66 194.88
sm3-avx | 119.87 163.05 244.44 260.92 257.60 264.87 265.88

Signed-off-by: Tianjia Zhang <tianjia.zhang@xxxxxxxxxxxxxxxxx>
---
arch/x86/crypto/Makefile | 3 +
arch/x86/crypto/sm3-avx-asm_64.S | 521 +++++++++++++++++++++++++++++++
arch/x86/crypto/sm3_avx_glue.c | 134 ++++++++
crypto/Kconfig | 13 +
4 files changed, 671 insertions(+)
create mode 100644 arch/x86/crypto/sm3-avx-asm_64.S
create mode 100644 arch/x86/crypto/sm3_avx_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index f307c93fc90a..7cbe860f6201 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -88,6 +88,9 @@ nhpoly1305-avx2-y := nh-avx2-x86_64.o nhpoly1305-avx2-glue.o
obj-$(CONFIG_CRYPTO_CURVE25519_X86) += curve25519-x86_64.o
+obj-$(CONFIG_CRYPTO_SM3_AVX_X86_64) += sm3-avx-x86_64.o
+sm3-avx-x86_64-y := sm3-avx-asm_64.o sm3_avx_glue.o
+
obj-$(CONFIG_CRYPTO_SM4_AESNI_AVX_X86_64) += sm4-aesni-avx-x86_64.o
sm4-aesni-avx-x86_64-y := sm4-aesni-avx-asm_64.o sm4_aesni_avx_glue.o
diff --git a/arch/x86/crypto/sm3-avx-asm_64.S b/arch/x86/crypto/sm3-avx-asm_64.S
new file mode 100644
index 000000000000..e7a9a37f3609
--- /dev/null
+++ b/arch/x86/crypto/sm3-avx-asm_64.S
@@ -0,0 +1,521 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * SM3 AVX accelerated transform.
+ * specified in: https://datatracker.ietf.org/doc/html/draft-sca-cfrg-sm3-02
+ *
+ * Copyright (C) 2021 Jussi Kivilinna <jussi.kivilinna@xxxxxx>
+ * Copyright (C) 2021 Tianjia Zhang <tianjia.zhang@xxxxxxxxxxxxxxxxx>
+ */
<snip>
+
+#define R(i, a, b, c, d, e, f, g, h, round, widx, wtype) \
+ /* rol(a, 12) => t0 */ \
+ roll3mov(12, a, t0); /* rorxl here would reduce perf by 6% on zen3 */ \
+ /* rol (t0 + e + t), 7) => t1 */ \
+ addl3(t0, e, t1); \
+ addl $K##round, t1; \

It's better to use "leal K##round(t0, e, 1), t1;" here and fix K0-K63 macros
instead as I noted at libgcrypt mailing-list:
https://lists.gnupg.org/pipermail/gcrypt-devel/2021-December/005209.html

-Jussi