Re: [PATCH] lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels

From: Linus Torvalds
Date: Mon Aug 28 2023 - 12:25:50 EST


On Mon, 28 Aug 2023 at 00:33, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
>
> Several architectures (incl. x86, but excl. amd64) do build the kernel with
> -freestanding.
>
> IIRC, the issue was that without that, gcc was "optimizing" calls
> to standard functions (implemented as inline optimized assembler
> functions) by replacing them with calls to other standard functions
> (also implemented as inline optimized assembler functions).

So using -ffreestanding is definitely the right thing to do for a
kernel in theory. It's very much supposed to tell the compiler to not
assume a standard libc, and without that gcc will do various
transformations that make sense when you "know" what libc does, but
may not make sense in the limited library model of a kernel.

So without it, gcc will do things like converting a 'printf()' call
without any conversion characters to a much cheaper 'puts()' etc. Now,
we often avoid that issue entirely by having our own function names
(ie printk()), but we do tend to use the *really* core C library
names.

Anyway, it turns out that some of the things you miss out on with
-ffreestanding are kind of important. In particular, at least gcc will
stop some 'memcpy()' optimizations too, which ends up being pretty
horrendous.

So while -ffreestanding would be the right thing to do in theory, in
practice it's actually pretty horrible. It's a big hammer that affects
a lot of things, and while many of them make sense for a kernel, some
of them are really bad. Which is why x86-64 no longer uses it.

I would actually suggest other architectures take a look if they care
at all about code generation. In particular, look at the x86-64
version of 'string.h' in

arch/x86/include/asm/string_64.h

and note the difference with the 32-bit one. The 32-bit one is the
"this is how we used to do it" that nobody cared enough to change. The
64-bit one is much simpler and actually generates better code simply
because gcc recognizes memcpy() and friends, and will then inline it
when small etc.

The *downside* is that now you have to trust the compiler to do the
right thing. And that will depend on compiler version etc. There's a
reason why 32-bit x86 does everything by hand: when your compiler
history starts at gcc-1.40, things are simply *very* different from
when you now rely on gcc-5.1 and newer...

Put another way: gcc has changed, and what used to make sense probably
doesn't make sense any more.

Linus