Re: [PATCH] riscv: Optimize memset

From: zhangfei
Date: Wed May 10 2023 - 21:43:35 EST


From: zhangfei <zhangfei@xxxxxxxxxxxxxx>

On Wed, May 10, 2023 at 14:58:22PM +0200, Andrew Jones wrote:
> On Wed, May 10, 2023 at 11:52:43AM +0800, zhangfei wrote:
> > From: zhangfei <zhangfei@xxxxxxxxxxxxxx>
> >
> > On Tue, May 09, 2023 11:16:33AM +0200, Andrew Jones wrote:
> > > On Tue, May 09, 2023 at 10:22:07AM +0800, zhangfei wrote:
> > > >
> > > > Hi,
> > > >
> > > > I filled head and tail with minimal branching. Each conditional ensures that
> > > > all the subsequently used offsets are well-defined and in the dest region.
> > >
> > > I know. You trimmed my comment, so I'll quote myself, here
> > >
> > > """
> > > After the check of a2 against 6 above we know that offsets 6(t0)
> > > and -7(a3) are safe. Are we trying to avoid too may redundant
> > > stores with these additional checks?
> > > """
> > >
> > > So, again. Why the additional check against 8 above and, the one you
> > > trimmed, checking 10?
> >
> > Hi,
> >
> > These additional checks are to avoid too many redundant stores.
> >
> > Adding a check for more than 8 bytes is because after the loop
> > segment '3' comes out, the remaining bytes are less than 8 bytes,
> > which also avoids redundant stores.
>
> So the benchmarks showed these additional checks were necessary to avoid
> making memset worse? Please add comments to the code explaining the
> purpose of the checks.

Hi,

As you mentioned, the lack of these additional tests can make memset worse.
When I removed the checks for 8 and 10 above, the benchmarks showed that the
memset changed to 0.21 bytes/ns at 8B. Although this is better than storing
byte by byte, additional detections will bring a better improvement to 0.27 bytes/ns.

Due to the chaotic response in my previous email, I am sorry for this. I have
reorganized patch v2 and sent it to you. Please reply under the latest patch.

Thanks,
Fei Zhang