Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions

From: Vincent MAILHOL
Date: Sun Jan 28 2024 - 08:27:56 EST


On Sun. 28 janv. 2024 at 21:16, David Laight <David.Laight@xxxxxxxxxx> wrote:
> From: Vincent MAILHOL
> > Sent: 28 January 2024 06:27
> >
> > On Sun. 28 Jan. 2024 at 14:39, Finn Thain <fthain@xxxxxxxxxxxxxx> wrote:
> > > On Sun, 28 Jan 2024, Vincent Mailhol wrote:
> > >
> > > > The compiler is not able to do constant folding on "asm volatile" code.
> > > >
> > > > Evaluate whether or not the function argument is a constant expression
> > > > and if this is the case, return an equivalent builtin expression.
> > > >
> ...
> > > If the builtin has the desired behaviour, why do we reimplement it in asm?
> > > Shouldn't we abandon one or the other to avoid having to prove (and
> > > maintain) their equivalence?
> >
> > The asm is meant to produce better results when the argument is not a
> > constant expression. Below commit is a good illustration of why we
> > want both the asm and the built:
> >
> > https://git.kernel.org/torvalds/c/146034fed6ee
> >
> > I say "is meant", because I did not assert whether this is still true.
> > Note that there are some cases in which the asm is not better anymore,
> > for example, see this thread:
> >
> > https://lore.kernel.org/lkml/20221106095106.849154-2-mailhol.vincent@xxxxxxxxxx/
> >
> > but I did not receive more answers, so I stopped trying to investigate
> > the subject.
> >
> > If you want, you can check the produced assembly of both the asm and
> > the builtin for both clang and gcc, and if the builtin is always
> > either better or equivalent, then the asm can be removed. That said, I
> > am not spending more effort there after being ghosted once (c.f. above
> > thread).
>
> I don't see any example there of why the __builtin_xxx() versions
> shouldn't be used all the time.
> (The x86-64 asm blocks contain unrelated call instructions and objdump
> wasn't passed -d to show what they were.
> One even has the 'return thunk pessimisation showing.)

Fair. My goal was not to point to the assembly code but to this sentence:

However, for non constant expressions, the kernel's ffs() asm
version remains better for x86_64 because, contrary to GCC, it
doesn't emit the CMOV assembly instruction

I should have been more clear. Sorry for that.

But the fact remains, on x86, some of the asm produced more optimized
code than the builtin.

> I actually suspect the asm versions predate the builtins.

This seems true. The __bultins were introduced in:

generic: Implement generic ffs/fls using __builtin_* functions
https://git.kernel.org/torvalds/c/048fa2df92c3

when the asm implementation already existed in m68k.

> Does (or can) the outer common header use the __builtin functions
> if no asm version exists?

Yes, this would be extremely easy. You just need to

#include/asm-generic/bitops/builtin-__ffs.h
#include/asm-generic/bitops/builtin-ffs.h
#include/asm-generic/bitops/builtin-__fls.h
#include/asm-generic/bitops/builtin-fls.h

and remove all the asm implementations. If you give me your green
light, I can do that change. This is trivial. The only thing I am not
ready to do is to compare the produced assembly code and confirm
whether or not it is better to remove asm code.

Thoughts?

Yours sincerely,
Vincent Mailhol