Re: [PATCH v2 0/2] RISC-V: Optimize memset for data sizes less than 16 bytes

From: Andrew Jones
Date: Thu May 11 2023 - 03:44:48 EST


On Thu, May 11, 2023 at 09:26:04AM +0800, zhangfei wrote:
> From: zhangfei <zhangfei@xxxxxxxxxxxxxx>
>
> At present, the implementation of the memset function uses byte by byte storage
> when processing tail data or when the initial data size is less than 16 bytes.
> This approach is not efficient. Therefore, I filled head and tail with minimal
> branching. Each conditional ensures that all the subsequently used offsets are
> well-defined and in the dest region. Although this approach may result in
> redundant storage, compared to byte by byte storage, it allows storage instructions
> to be executed in parallel, reduces the number of jumps, and ultimately achieves
> performance improvement.
>
> I used the code linked below for performance testing and commented on the memset
> that calls the arm architecture in the code to ensure it runs properly on the
> risc-v platform.
>
> [1] https://github.com/ARM-software/optimized-routines/blob/master/string/bench/memset.c#L53
>
> The testing platform selected RISC-V SiFive U74.The test data is as follows:
>
> Before optimization
> ---------------------
> Random memset (bytes/ns):
> memset_call 32K:0.45 64K:0.35 128K:0.30 256K:0.28 512K:0.27 1024K:0.25 avg 0.30
>
> Medium memset (bytes/ns):
> memset_call 8B:0.18 16B:0.48 32B:0.91 64B:1.63 128B:2.71 256B:4.40 512B:5.67
> Large memset (bytes/ns):
> memset_call 1K:6.62 2K:7.02 4K:7.46 8K:7.70 16K:7.82 32K:7.63 64K:1.40
>
> After optimization
> ---------------------
> Random memset bytes/ns):
> memset_call 32K:0.46 64K:0.35 128K:0.30 256K:0.28 512K:0.27 1024K:0.25 avg 0.31
> Medium memset (bytes/ns )
> memset_call 8B:0.27 16B:0.48 32B:0.91 64B:1.64 128B:2.71 256B:4.40 512B:5.67
> Large memset (bytes/ns):
> memset_call 1K:6.62 2K:7.02 4K:7.47 8K:7.71 16K:7.83 32K:7.63 64K:1.40
>
> From the results, it can be seen that memset has significantly improved its performance with
> a data volume of around 8B, from 0.18 bytes/ns to 0.27 bytes/ns.
>
> The previous work was as follows:
> 1. "[PATCH] riscv: Optimize memset"
> 6d1cbe2e.3c31d.187eb14d990.Coremail.zhangfei@xxxxxxxxxxxxxx

Cover letters should have a changelog, in this case a couple phrases
stating what's different in v2 vs. v1.

Thanks,
drew

>
> Thanks,
> Fei Zhang
>
> Andrew Jones (1):
> RISC-V: lib: Improve memset assembler formatting
>
> arch/riscv/lib/memset.S | 143 ++++++++++++++++++++--------------------
> 1 file changed, 72 insertions(+), 71 deletions(-)
>
> zhangfei (1):
> RISC-V: lib: Optimize memset performance
>
> arch/riscv/lib/memset.S | 40 +++++++++++++++++++++++++++++++++++++---
> 1 file changed, 37 insertions(+), 3 deletions(-)
>