Re: [PATCH] lib/clz_ctz.c: Fix clzdi2() and ctzdi2() for 32-bit kernels

From: Bill Wendling
Date: Fri Aug 25 2023 - 20:10:16 EST

Next message: Abhinav Kumar: "Re: [PATCH v4 2/4] drm/msm/dpu: Enable widebus for DSI INTF"
Previous message: Jakub Kicinski: "Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA"
In reply to: Linus Torvalds: "Re: [PATCH] lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels"
Next in thread: Nick Desaulniers: "Re: [PATCH] lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Aug 25, 2023 at 4:35 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, 25 Aug 2023 at 15:57, Bill Wendling <morbo@xxxxxxxxxx> wrote:
> > >
> > Another idea is that there are __builtin_* functions for a lot of
> > functions that are currently in inline asm
>
> No. We've been through this before. The builtins are almost entirely
> untested, and often undocumented and buggy.
>
> > The major issue with the
> > `__builtin_ia32_readeflags_*` was its inability to take unrelated MSRs
> > into account during code motion. That may not be the same worry here?
>
> No, the problem with __builtin_ia32_readeflags_*() was that it was
> literally completely buggy and generated entirely broken code:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971
>
> but that's really more of a symptom than anything else.
>
> It's a symptom of the fact that unlike inline asm's, those builtins
> are often undocumented in what compiler version they appeared, and are
> of very questionable quality. They often don't have many users, and
> the test suites are non-existent.
>
> For example, we *do* use __builtin_ffs() on x86 for constant values,
> because the compiler does the right thing.
>
> But for non-constant ones, the inline asm actually generates better
> code: gcc generatea some disgusting mess with a 'bsf' followed by a
> 'cmov' for the zero case, when we know better.
>
> See for example
>
> https://godbolt.org/z/jKKf48Wsf
>
Ew...gross.

> I don't understand why compiler people prefer a builtin that is an
> untested special case that assumes that the compiler knows what is
> going on (and often doesn't), over a generic escape facility that is
> supported and needed anyway (inline asm).
>
> In other words: the statement "builtins generate better code" is
> simply PROVABLY NOT TRUE.
>
> Builtins have often generated *worse* code than using inline asms, to
> the point where "worse" is actively buggy crap.
>
> At least inline asms are reliable. That's a *big* deal.
>
There are a couple of reasons why compiler writers (at least I) prefer
builtins to inline asm. Inline asm takes control away from the
compiler, which makes it harder for the compiler to perform normal
optimizations. It's more portable. As for better code, it won't
generate it in all situations, as you pointed out, but can typically
generate as good code.

Inline asm has its own issues---portability, difficult to use
constraints (the EFLAGS asms went back and forth on its constraints
over the years), and, from what I remember, GNU's inline asm is
closely tied to its register allocator, making it hard to support in
other compilers. Plus, there's not guarantee that an inline asm won't
be moved. (It's been discussed before, and I also believe that the
current Linux inline asm "barriers" should prevent this. I just vastly
prefer guaranteed behavior over "it works now". I've migrated many
systems to new compilers (and even compiler versions) and have seen
"well, it worked for the previous compiler" types of bugs that were
actual bugs in the programmer's code.)

Okay, none of what I said is going to convince you to use builtins,
and I'm not suggesting we do. It's just a few of the reasons why we
tend to prefer them to other methods. Just take this as an FYI. :-)

-bw

Next message: Abhinav Kumar: "Re: [PATCH v4 2/4] drm/msm/dpu: Enable widebus for DSI INTF"
Previous message: Jakub Kicinski: "Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA"
In reply to: Linus Torvalds: "Re: [PATCH] lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels"
Next in thread: Nick Desaulniers: "Re: [PATCH] lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Re: [PATCH] lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels

Re: [PATCH] lib/clz_ctz.c: Fix clzdi2() and ctzdi2() for 32-bit kernels