RE: [PATCH v8 1/5] crypto: aspeed: Add HACE hash driver

From: Neal Liu
Date: Tue Aug 09 2022 - 03:39:27 EST


> >> On 2022/7/26 19:34, Neal Liu wrote:
> >>> Hash and Crypto Engine (HACE) is designed to accelerate the
> >>> throughput of hash data digest, encryption, and decryption.
> >>>
> >>> Basically, HACE can be divided into two independently engines
> >>> - Hash Engine and Crypto Engine. This patch aims to add HACE hash
> >>> engine driver for hash accelerator.
> >>>
> >>> Signed-off-by: Neal Liu <neal_liu@xxxxxxxxxxxxxx>
> >>> Signed-off-by: Johnny Huang <johnny_huang@xxxxxxxxxxxxxx>
> >>> ---
> >>> MAINTAINERS | 7 +
> >>> drivers/crypto/Kconfig | 1 +
> >>> drivers/crypto/Makefile | 1 +
> >>> drivers/crypto/aspeed/Kconfig | 32 +
> >>> drivers/crypto/aspeed/Makefile | 6 +
> >>> drivers/crypto/aspeed/aspeed-hace-hash.c | 1389
> >> ++++++++++++++++++++++
> >>> drivers/crypto/aspeed/aspeed-hace.c | 213 ++++
> >>> drivers/crypto/aspeed/aspeed-hace.h | 186 +++
> >>> 8 files changed, 1835 insertions(+) create mode 100644
> >>> drivers/crypto/aspeed/Kconfig create mode 100644
> >>> drivers/crypto/aspeed/Makefile create mode 100644
> >>> drivers/crypto/aspeed/aspeed-hace-hash.c
> >>> create mode 100644 drivers/crypto/aspeed/aspeed-hace.c
> >>> create mode 100644 drivers/crypto/aspeed/aspeed-hace.h
> >>>
> >>> diff --git a/MAINTAINERS b/MAINTAINERS index
> >>> f55aea311af5..23a0215b7e42 100644
> >>> --- a/MAINTAINERS
> >>> +++ b/MAINTAINERS
> >>> @@ -3140,6 +3140,13 @@ S: Maintained
> >>> F: Documentation/devicetree/bindings/media/aspeed-video.txt
> >>> F: drivers/media/platform/aspeed/
> >>>
> >>> +ASPEED CRYPTO DRIVER
> >>> +M: Neal Liu <neal_liu@xxxxxxxxxxxxxx>
> >>> +L: linux-aspeed@xxxxxxxxxxxxxxxx (moderated for non-subscribers)
> >>> +S: Maintained
> >>> +F:
> >> Documentation/devicetree/bindings/crypto/aspeed,ast2500-hace.yaml
> >>> +F: drivers/crypto/aspeed/
> >>> +
> >>> ASUS NOTEBOOKS AND EEEPC ACPI/WMI EXTRAS DRIVERS
> >>> M: Corentin Chary <corentin.chary@xxxxxxxxx>
> >>> L: acpi4asus-user@xxxxxxxxxxxxxxxxxxxxx
> >>> diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig index
> >>> ee99c02c84e8..b9f5ee126881 100644
> >>> --- a/drivers/crypto/Kconfig
> >>> +++ b/drivers/crypto/Kconfig
> >>> @@ -933,5 +933,6 @@ config CRYPTO_DEV_SA2UL
> >>> acceleration for cryptographic algorithms on these devices.
> >>>
> >>> source "drivers/crypto/keembay/Kconfig"
> >>> +source "drivers/crypto/aspeed/Kconfig"
> >>>
> >>> endif # CRYPTO_HW
> >>> diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile index
> >>> f81703a86b98..116de173a66c 100644
> >>> --- a/drivers/crypto/Makefile
> >>> +++ b/drivers/crypto/Makefile
> >>> @@ -1,5 +1,6 @@
> >>> # SPDX-License-Identifier: GPL-2.0
> >>> obj-$(CONFIG_CRYPTO_DEV_ALLWINNER) += allwinner/
> >>> +obj-$(CONFIG_CRYPTO_DEV_ASPEED) += aspeed/
> >>> obj-$(CONFIG_CRYPTO_DEV_ATMEL_AES) += atmel-aes.o
> >>> obj-$(CONFIG_CRYPTO_DEV_ATMEL_SHA) += atmel-sha.o
> >>> obj-$(CONFIG_CRYPTO_DEV_ATMEL_TDES) += atmel-tdes.o diff --git
> >>> a/drivers/crypto/aspeed/Kconfig b/drivers/crypto/aspeed/Kconfig new
> >>> file mode 100644 index 000000000000..059e627efef8
> >>> --- /dev/null
> >>> +++ b/drivers/crypto/aspeed/Kconfig
> >>> @@ -0,0 +1,32 @@
> >>> +config CRYPTO_DEV_ASPEED
> >>> + tristate "Support for Aspeed cryptographic engine driver"
> >>> + depends on ARCH_ASPEED
> >>> + help
> >>> + Hash and Crypto Engine (HACE) is designed to accelerate the
> >>> + throughput of hash data digest, encryption and decryption.
> >>> +
> >>> + Select y here to have support for the cryptographic driver
> >>> + available on Aspeed SoC.
> >>> +
> >>> +config CRYPTO_DEV_ASPEED_HACE_HASH
> >>> + bool "Enable Aspeed Hash & Crypto Engine (HACE) hash"
> >>> + depends on CRYPTO_DEV_ASPEED
> >>> + select CRYPTO_ENGINE
> >>> + select CRYPTO_SHA1
> >>> + select CRYPTO_SHA256
> >>> + select CRYPTO_SHA512
> >>> + select CRYPTO_HMAC
> >>> + help
> >>> + Select here to enable Aspeed Hash & Crypto Engine (HACE)
> >>> + hash driver.
> >>> + Supports multiple message digest standards, including
> >>> + SHA-1, SHA-224, SHA-256, SHA-384, SHA-512, and so on.
> >>> +
> >>> +config CRYPTO_DEV_ASPEED_HACE_HASH_DEBUG
> >>> + bool "Enable HACE hash debug messages"
> >>> + depends on CRYPTO_DEV_ASPEED_HACE_HASH
> >>> + help
> >>> + Print HACE hash debugging messages if you use this option
> >>> + to ask for those messages.
> >>> + Avoid enabling this option for production build to
> >>> + minimize driver timing.
> >>> diff --git a/drivers/crypto/aspeed/Makefile
> >> b/drivers/crypto/aspeed/Makefile
> >>> new file mode 100644
> >>> index 000000000000..8bc8d4fed5a9
> >>> --- /dev/null
> >>> +++ b/drivers/crypto/aspeed/Makefile
> >>> @@ -0,0 +1,6 @@
> >>> +obj-$(CONFIG_CRYPTO_DEV_ASPEED) += aspeed_crypto.o
> >>> +aspeed_crypto-objs := aspeed-hace.o \
> >>> + $(hace-hash-y)
> >>> +
> >>> +obj-$(CONFIG_CRYPTO_DEV_ASPEED_HACE_HASH) +=
> aspeed-hace-hash.o
> >>> +hace-hash-$(CONFIG_CRYPTO_DEV_ASPEED_HACE_HASH) :=
> >> aspeed-hace-hash.o
> >>> diff --git a/drivers/crypto/aspeed/aspeed-hace-hash.c
> >> b/drivers/crypto/aspeed/aspeed-hace-hash.c
> >>> new file mode 100644
> >>> index 000000000000..63a8ad694996
> >>> --- /dev/null
> >>> +++ b/drivers/crypto/aspeed/aspeed-hace-hash.c
> >>> @@ -0,0 +1,1389 @@

[...]

> >>> +static int aspeed_ahash_dma_prepare_sg(struct aspeed_hace_dev
> >> *hace_dev)
> >>> +{
> >>> + struct aspeed_engine_hash *hash_engine =
> &hace_dev->hash_engine;
> >>> + struct ahash_request *req = hash_engine->req;
> >>> + struct aspeed_sham_reqctx *rctx = ahash_request_ctx(req);
> >>> + struct aspeed_sg_list *src_list;
> >>> + struct scatterlist *s;
> >>> + int length, remain, sg_len, i;
> >>> + int rc = 0;
> >>> +
> >>> + remain = (rctx->total + rctx->bufcnt) % rctx->block_size;
> >>> + length = rctx->total + rctx->bufcnt - remain;
> >>> +
> >>> + AHASH_DBG(hace_dev, "%s:0x%x, %s:0x%x, %s:0x%x, %s:0x%x\n",
> >>> + "rctx total", rctx->total, "bufcnt", rctx->bufcnt,
> >>> + "length", length, "remain", remain);
> >>> +
> >>> + sg_len = dma_map_sg(hace_dev->dev, rctx->src_sg, rctx->src_nents,
> >>> + DMA_TO_DEVICE);
> >>> + if (!sg_len) {
> >>> + dev_warn(hace_dev->dev, "dma_map_sg() src error\n");
> >>> + rc = -ENOMEM;
> >>> + goto end;
> >>> + }
> >>> +
> >>> + src_list = (struct aspeed_sg_list *)hash_engine->ahash_src_addr;
> >>> + rctx->digest_dma_addr = dma_map_single(hace_dev->dev,
> rctx->digest,
> >>> + SHA512_DIGEST_SIZE,
> >>> + DMA_BIDIRECTIONAL);
> >>> + if (dma_mapping_error(hace_dev->dev, rctx->digest_dma_addr)) {
> >>> + dev_warn(hace_dev->dev, "dma_map() rctx digest error\n");
> >>> + rc = -ENOMEM;
> >>> + goto free_src_sg;
> >>> + }
> >>> +
> >>> + if (rctx->bufcnt != 0) {
> >>> + rctx->buffer_dma_addr = dma_map_single(hace_dev->dev,
> >>> + rctx->buffer,
> >>> + rctx->block_size * 2,
> >>> + DMA_TO_DEVICE);
> >>> + if (dma_mapping_error(hace_dev->dev,
> rctx->buffer_dma_addr)) {
> >>> + dev_warn(hace_dev->dev, "dma_map() rctx buffer
> error\n");
> >>> + rc = -ENOMEM;
> >>> + goto free_rctx_digest;
> >>> + }
> >>> +
> >>> + src_list[0].phy_addr = rctx->buffer_dma_addr;
> >>> + src_list[0].len = rctx->bufcnt;
> >>> + length -= src_list[0].len;
> >>> +
> >>> + /* Last sg list */
> >>> + if (length == 0)
> >>> + src_list[0].len |= HASH_SG_LAST_LIST;
> >>> +
> >>> + src_list[0].phy_addr = cpu_to_le32(src_list[0].phy_addr);
> >>> + src_list[0].len = cpu_to_le32(src_list[0].len);
> >>> + src_list++;
> >>> + }
> >>> +
> >>> + if (length != 0) {
> >>> + for_each_sg(rctx->src_sg, s, sg_len, i) {
> >>> + src_list[i].phy_addr = sg_dma_address(s);
> >>> +
> >>> + if (length > sg_dma_len(s)) {
> >>> + src_list[i].len = sg_dma_len(s);
> >>> + length -= sg_dma_len(s);
> >>> +
> >>> + } else {
> >>> + /* Last sg list */
> >>> + src_list[i].len = length;
> >>> + src_list[i].len |= HASH_SG_LAST_LIST;
> >>> + length = 0;
> >>> + }
> >>> +
> >>> + src_list[i].phy_addr = cpu_to_le32(src_list[i].phy_addr);
> >>> + src_list[i].len = cpu_to_le32(src_list[i].len);
> >>> + }
> >>> + }
> >>> +
> >>> + if (length != 0) {
> >>> + rc = -EINVAL;
> >>> + goto free_rctx_buffer;
> >>> + }
> >>> +
> >>> + rctx->offset = rctx->total - remain;
> >>> + hash_engine->src_length = rctx->total + rctx->bufcnt - remain;
> >>> + hash_engine->src_dma = hash_engine->ahash_src_dma_addr;
> >>> + hash_engine->digest_dma = rctx->digest_dma_addr;
> >>> +
> >>> + goto end;
> >> Exiting via "goto xx" is not recommended in normal code logic (this
> >> requires two jumps), exiting via "return 0" is more efficient.
> >> This code method has many times in your entire driver, it is
> >> recommended to modify it.
> >
> > If not exiting via "goto xx", how to release related resources without any
> problem?
> > Is there any proper way to do this?
> maybe I didn't describe it clearly enough.
> "in normal code logic" means rc=0
> In this scenario (rc=0), "goto xx" is no longer required, it can be replaced with
> "return 0"

Okay, I got your point. In this case, "goto end" is no longer required of course.
I would send next patch with this fixed included.

> >
> >>> +
> >>> +free_rctx_buffer:
> >>> + if (rctx->bufcnt != 0)
> >>> + dma_unmap_single(hace_dev->dev, rctx->buffer_dma_addr,
> >>> + rctx->block_size * 2, DMA_TO_DEVICE);
> >>> +free_rctx_digest:
> >>> + dma_unmap_single(hace_dev->dev, rctx->digest_dma_addr,
> >>> + SHA512_DIGEST_SIZE, DMA_BIDIRECTIONAL);
> >>> +free_src_sg:
> >>> + dma_unmap_sg(hace_dev->dev, rctx->src_sg, rctx->src_nents,
> >>> + DMA_TO_DEVICE);
> >>> +end:
> >>> + return rc;
> >>> +}
> >>> +
> >>> +static int aspeed_ahash_complete(struct aspeed_hace_dev *hace_dev)
> >>> +{
> >>> + struct aspeed_engine_hash *hash_engine =
> &hace_dev->hash_engine;
> >>> + struct ahash_request *req = hash_engine->req;
> >>> +
> >>> + AHASH_DBG(hace_dev, "\n");
> >>> +
> >>> + hash_engine->flags &= ~CRYPTO_FLAGS_BUSY;
> >>> +
> >>> + crypto_finalize_hash_request(hace_dev->crypt_engine_hash, req, 0);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +/*
> >>> + * Copy digest to the corresponding request result.
> >>> + * This function will be called at final() stage.
> >>> + */
> >>> +static int aspeed_ahash_transfer(struct aspeed_hace_dev *hace_dev)
> >>> +{
> >>> + struct aspeed_engine_hash *hash_engine =
> &hace_dev->hash_engine;
> >>> + struct ahash_request *req = hash_engine->req;
> >>> + struct aspeed_sham_reqctx *rctx = ahash_request_ctx(req);
> >>> +
> >>> + AHASH_DBG(hace_dev, "\n");
> >>> +
> >>> + dma_unmap_single(hace_dev->dev, rctx->digest_dma_addr,
> >>> + SHA512_DIGEST_SIZE, DMA_BIDIRECTIONAL);
> >>> +
> >>> + dma_unmap_single(hace_dev->dev, rctx->buffer_dma_addr,
> >>> + rctx->block_size * 2, DMA_TO_DEVICE);
> >>> +
> >>> + memcpy(req->result, rctx->digest, rctx->digsize);
> >>> +
> >>> + return aspeed_ahash_complete(hace_dev); }
> >>> +
> >>> +/*
> >>> + * Trigger hardware engines to do the math.
> >>> + */
> >>> +static int aspeed_hace_ahash_trigger(struct aspeed_hace_dev
> *hace_dev,
> >>> + aspeed_hace_fn_t resume)
> >>> +{
> >>> + struct aspeed_engine_hash *hash_engine =
> &hace_dev->hash_engine;
> >>> + struct ahash_request *req = hash_engine->req;
> >>> + struct aspeed_sham_reqctx *rctx = ahash_request_ctx(req);
> >>> +
> >>> + AHASH_DBG(hace_dev, "src_dma:0x%x, digest_dma:0x%x,
> >> length:0x%x\n",
> >>> + hash_engine->src_dma, hash_engine->digest_dma,
> >>> + hash_engine->src_length);
> >>> +
> >>> + rctx->cmd |= HASH_CMD_INT_ENABLE;
> >>> + hash_engine->resume = resume;
> >>> +
> >>> + ast_hace_write(hace_dev, hash_engine->src_dma,
> >> ASPEED_HACE_HASH_SRC);
> >>> + ast_hace_write(hace_dev, hash_engine->digest_dma,
> >>> + ASPEED_HACE_HASH_DIGEST_BUFF);
> >>> + ast_hace_write(hace_dev, hash_engine->digest_dma,
> >>> + ASPEED_HACE_HASH_KEY_BUFF);
> >>> + ast_hace_write(hace_dev, hash_engine->src_length,
> >>> + ASPEED_HACE_HASH_DATA_LEN);
> >>> +
> >>> + /* Memory barrier to ensure all data setup before engine starts */
> >>> + mb();
> >>> +
> >>> + ast_hace_write(hace_dev, rctx->cmd, ASPEED_HACE_HASH_CMD);
> >> A hardware service sending requires 5 hardware commands to complete.
> >> In a multi-concurrency scenario, how to ensure the order of commands?
> >> (If two processes send hardware task at the same time, How to ensure
> >> that the hardware recognizes which task the current command belongs
> >> to?)
> >
> > Linux crypto engine would guarantee that only one request at each time to
> be dequeued from engine queue to process.
> > And there has lock mechanism inside Linux crypto engine to prevent the
> scenario you mentioned.
> > So only 1 aspeed_hace_ahash_trigger() hardware service would go through
> at a time.
> >
> > [...]
> > .
> >
> You may not understand what I mean, the command flow in a normal scenario:
> request_A: Acmd1-->Acmd2-->Acmd3-->Acmd4-->Acmd5
> request_B: Bcmd1-->Bcmd2-->Bcmd3-->Bcmd4-->Bcmd5
> In a multi-process concurrent scenario, multiple crypto engines can be enabled,
> and each crypto engine sends a request. If multiple requests here enter
> aspeed_hace_ahash_trigger() at the same time, the command flow will be
> intertwined like this:
> request_A, request_B:
> Acmd1-->Bcmd1-->Acmd2-->Acmd3-->Bcmd2-->Acmd4-->Bcmd3-->Bcmd4-->A
> cmd5-->Bcmd5
>
> In this command flow, how does your hardware identify whether these
> commands belong to request_A or request_B?
> Thanks.
> Longfang.

For my understanding, all requests will transfer into engine queue through crypto_transfer_hash_request_to_engine().
In your example, request_A & request_B would also enqueue into the engine queue, and pump out 1 request which might be FIFO to handle it.
crypto_pump_requests() will dequeue only 1 request at a time and to prepare_request() & do_one_request() if it's registered.
And aspeed_hace_ahash_trigger() is inside do_one_request(), so that means no other requests would come in during aspeed_hace_ahash_trigger() whole process.
The command flow intertwined
Acmd1-->Bcmd1-->Acmd2-->Acmd3-->Bcmd2-->Acmd4-->Bcmd3-->Bcmd4-->Acmd5-->Bcmd5 would not exist in any scenario.
Correct me if I'm misunderstanding, Thanks.