RE: [PATCH] riscv: lib: Optimize 'strlen' function

From: David Laight
Date: Sun Dec 17 2023 - 12:01:30 EST

Next message: Naresh Maramaina: "Re: [PATCH V5 0/2] Add CPU latency QoS support for ufs driver"
Previous message: Vinod Koul: "[GIT PULL]: soundwire fixes for v6.7"
Next in thread: Ivan Orlov: "Re: [PATCH] riscv: lib: Optimize 'strlen' function"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Ivan Orlov
> Sent: 13 December 2023 15:46
>
> The current non-ZBB implementation of 'strlen' function iterates the
> memory bytewise, looking for a zero byte. It could be optimized to use
> the wordwise iteration instead, so we will process 4/8 bytes of memory
> at a time.
...
> 1. If the address is unaligned, iterate SZREG - (address % SZREG) bytes
> to align it.

An alternative is to mask the address and 'or' in non-zero bytes
into the first word - might be faster.

...
> Here you can find the benchmarking results for the VisionFive2 board
> comparing the old and new implementations of the strlen function.
>
> Size: 1 (+-0), mean_old: 673, mean_new: 666
> Size: 2 (+-0), mean_old: 672, mean_new: 676
> Size: 4 (+-0), mean_old: 685, mean_new: 659
> Size: 8 (+-0), mean_old: 682, mean_new: 673
> Size: 16 (+-0), mean_old: 718, mean_new: 694
...

Is that 32bit or 64bit?
The word-at-a-time strlen() is typically not worth it for 32bit.

I'd also guess that pretty much all the calls in-kernel are short.
You might try counting as: histogram[ilog2(strlen_result)]++
and seeing what it shows for some workload.
I bet you (a beer if I see you!) that you won't see many over 1k.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Next message: Naresh Maramaina: "Re: [PATCH V5 0/2] Add CPU latency QoS support for ufs driver"
Previous message: Vinod Koul: "[GIT PULL]: soundwire fixes for v6.7"
Next in thread: Ivan Orlov: "Re: [PATCH] riscv: lib: Optimize 'strlen' function"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]