Re: [PATCH] lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels

From: Nick Desaulniers
Date: Mon Aug 28 2023 - 16:16:01 EST


On Mon, Aug 28, 2023 at 9:30 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, 28 Aug 2023 at 03:53, David Laight <David.Laight@xxxxxxxxxx> wrote:
> >
> > From: Linus Torvalds
> > >
> > > We use this:
> > >
> > > static __always_inline unsigned long variable__ffs(unsigned long word)
> > > {
> > > asm("rep; bsf %1,%0"
> > > : "=r" (word)
> > > : "rm" (word));
> > > return word;
> > > }
> > >
> > > for the definition, and it looks like clang royally just screws up
> > > here. Yes, "m" is _allowed_ in that input set, but it damn well
> > > shouldn't be used for something that is already in a register, since
> > > "r" is also allowed, and is the first choice.
> >
> > Why don't we just remove the "m" option?
>
> For this particular case, it would probably be the right thing to do.
> It's sad, though, because gcc handles this correctly, and always has.
>
> And in this particular case, it probably matters not at all.
>
> In many other cases where we have 'rm', we may actually be in the
> situation that having 'rm' (or other cases like "g" that also allows
> immediates) helps because register pressure can be a thing.
>
> It's mostly a thing on 32-bit x86 where you have a lot fewer
> registers, and there we've literally run into situations where we have
> had internal compiler errors because of complex inline asm statements
> running out of registers.
>
> With a simple "one input, one output" case, that just isn't an issue,
> so to work around a clang misfeature we could do it - if somebody
> finds a case where it actually matters (as opposed to "damn, when
> looking at the generted code for a function that we never actually use
> on x86, I noticed that code generation is horrendous").
>
> Linus

Yes; it's a compiler bug, and we will fix it. Then the fix will be an
incentive for folks that care to move to a newer toolchain.
--
Thanks,
~Nick Desaulniers