Re: [PATCH v6 0/2] x86/asm/bitops: optimize ff{s,z} functions for constant expressions

From: Vincent MAILHOL
Date: Thu Sep 01 2022 - 20:42:10 EST


On Thu. 1 Sep. 2022 at 23:19, Yury Norov <yury.norov@xxxxxxxxx> wrote:
> On Thu, Sep 01, 2022 at 07:30:10PM +0900, Vincent MAILHOL wrote:
> > On Tue. 1 sept. 2022 at 12:49, Yury Norov <yury.norov@xxxxxxxxx> wrote:
> > > On Wed, Aug 31, 2022 at 01:54:01AM -0700, Yury Norov wrote:
> > > > On Wed, Aug 31, 2022 at 04:57:40PM +0900, Vincent Mailhol wrote:
> > > > > The compilers provide some builtin expression equivalent to the ffs(),
> > > > > __ffs() and ffz() functions of the kernel. The kernel uses optimized
> > > > > assembly which produces better code than the builtin
> > > > > functions. However, such assembly code can not be folded when used
> > > > > with constant expressions.
> > > > >
> > > > > This series relies on __builtin_constant_p to select the optimal solution:
> > > > >
> > > > > * use kernel assembly for non constant expressions
> > > > >
> > > > > * use compiler's __builtin function for constant expressions.
> > > > >
> > > > >
> > > > > ** Statistics **
> > > > >
> > > > > Patch 1/2 optimizes 26.7% of ffs() calls and patch 2/2 optimizes 27.9%
> > > > > of __ffs() and ffz() calls (details of the calculation in each patch).
> > > >
> > > > Hi Vincent,
> > > >
> > > > Can you please add a test for this? We've recently added a very similar
> > > > test_bitmap_const_eval() in lib/test_bitmap.c.
> > > >
> > > > dc34d5036692c ("lib: test_bitmap: add compile-time optimization/evaluations
> > > > assertions")
> > > >
> > > > Would be nice to have something like this for ffs() and ffz() in
> > > > lib/test_bitops.c.
> > > >
> > > > Please keep me in loop in case of new versions.
> >
> > Hi Yury,
> >
> > My patch only takes care of the x86 architecture.
>
> OK, I just realized that you started submitting this at least back in May.
>
> For me, v6 is good enough and well-described. So, for the series:
> Reviewed-by: Yury Norov <yury.norov@xxxxxxxxx>

Thanks for the review!

> How are you going to merge it? If you haven't a specific tree in mind
> already, I can take it in my bitmap tree because ffs and ffz are closely
> related to find_bit() functions.

I never thought of a specific tree. I just CCed the x86 architecture
maintainers according to get_maintainer.pl and was expecting it to go
through the x86/asm branch of the tip tree. But I am perfectly fine if
it goes through your tree.

So same as Nick's comment below, unless Borislav still has concern on
the v6, please take it in your tree.

> > Assuming some other
> > architectures are not optimized yet, adding such a test might break
> > some builds. I am fine with adding the test, however, I will not write
> > patches for the other architecture because I do not have the
> > environment to compile and test it.
> >
> > Does it still make sense to add the test before fixing all the architectures?
>
> All-arches fix should begin with changing the ffs design. Namely, there
> should be a generic ffs() in include/linux/bitops.h,

Currently, the generic ffl, ffs, flz are under:
/include/asm-generic/bitops

especially, here is the generic ffs():
https://elixir.bootlin.com/linux/latest/source/include/asm-generic/bitops/ffs.h

Isn't this sufficient?

> and arch-specific
> arch__ffs() in arch/xxx/include/asm/bitops.h; like we do for the set_bit()
> family. I have a feeling that it's far beyond the scope of your series.
>
> The test is a different story. Good tests are always welcome, even if
> they don't cover all the arches.

ACK. I will add the test in a different patch *after* this series gets
accepted. But to be clear, I will not fix other architectures.

> > > Also, what about fls? Is there any difference with ffs/ffz wrt compile
> > > time optimizations? If not, would be great if the series will take
> > > care of it too.
> >
> > Agree. The fls() and fls64() can use __builtin_ctz() and
> > __builtin_ctzll(). However, those two functions are a bit less
> > trivial. I wanted to have this first series approved first before
> > working on *fls*().
>
> OK, the test and fls() can be a matter of a follow-up series, taking
> into account how long are these 2 patches moving.

ACK.

> Thanks,
> Yury