Re: __ucmpdi2

From: Linus Torvalds (torvalds@transmeta.com)
Date: Wed Sep 20 2000 - 13:02:32 EST


In article <10009192200.ZM278817@classic.engr.sgi.com>,
Jeremy Higdon <jeremy@classic.engr.sgi.com> wrote:
>> - Linux developers often do horribly stupid things, and use 64-bit
>> division etc instead of using a simple shift. Again, not linking
>> against libgcc finds those things early rather than late, because the
>> horribly stupid things end up requireing libgcc support.
>
>I would have thought that the compiler would generate a shift if it
>could (I'm presuming you're talking about shifting by a constant
>here -- or are you talking about code that always shifts by a
>computed power of two).

The compiler is smart, but the compiler doesn't have ESP.

For example, what some filesystems did was basically

        blocknumber = offset_64bit / filesystem->blocksize;

which is not optimizable. Because while it _is_ a division by a power of
two, gcc has no way of knowing that, nor _what_ power of two. Gcc
doesn't know that the ext2 blocksize is 1024, 2048 or 4096 ;)

The fix is to hit such Linux developers virtually on the head (by having
a kernel that doesn't link ;), and rewrite the code as

        blocknumber = offset_64bit >> filesystem->blocksize_bits;

which does exactly the same thing, except it is about a hundred times
faster.

See?

>> In the case of __ucmpdi2, it appears to be a combination of kernel and
>> compiler stupidity - there's no reason why __ucmpdi2 should not be done
>> inline by the compiler, but at the same time there is probably also
>> little reason to use a slow "long long" comparison in the kernel.
>
>Little reason or no reason? If there is a reason, and it doesn't
>work, then the coder is forced to rewrite using 32 bit variables,
>synthesizing the result. Then you have belabored C code as well
>as belabored machine code, and it doesn't automatically clean up
>when you move to a 64 bit machine.

Oh, but usually it does.

For example, most of the time these things are due to issues like

        if (offset >= page_offset(page))
                ...

where Page_offset() is simply "(unsigned long long)page->index <<
PAGE_CACHE_SHIFT"

Very readable, no?

But it doesn't get any worse by doing the comparison the other way
around, and instead doing

        if (index(offset) >= page->index)

which is faster (because now you have only one long long shift, not two
shifts and a comparison), and equally readable (yeah, you have to think
about it for a bit if you want to convince yourself that it's the same
thing due to the low-order bits you lost, but in many cases where we did
this conversion the end result was _more_ readable, because the end
result was that we always worked on index+offset parts, and there was no
confusion).

And on 64-bit machines the code is exactly the same too. No slow-down.

This was why I hated the original LFS patches. They mindlessly just
increased a lot of stuff to 64 bits, with no regard for what teh code
really _meant_. I ended up re-writing the core code completely before
LFS was accepted into the 2.3.x series - using page index calculations
instead, which meant that most of the actual critical routines _still_
did the same old 32-bit calculations, they just did them with the part
of the values that really mattered - thus giving effectively a 44 bit
address space.

And btw, doing it this way means that on the alpha we could potentially
have a "77-bit address space" for file mapping. So yes, it actually
means other improvments too - even for 64-bit machines.

(Now, the 77-bit address space that the new VM potentially gives to
64-bit architectures is only useful for people who use the page cache
directly, because obviously file sizes are still just 64-bit integers.
But it could be useful for the internal implementation of distributed
memory, for example.)

Ehh.. Basically, my argument boils down to the old truism: by thinking
about the problem and doign the smart thing, you can often do more with
less work.

>So what we've said is: 64 bit is okay, except in a switch statement,
>or other random expressions that might cause gcc to embed a call to
>similar libgcc function.

No, what Linux really says is that you should avoid using "long long"
(and thus 64-bit values), because on many architectures it is slower.

And I further say that it is usually very easy to avoid it.

But you shouldn't go overboard. Simple "long long" arithmetic is useful
and easy, even on 32-bit platforms. The kernel does quite a lot of it,
as all file offsets are basically 64 bits. But by thinking about the
problem some more, you can often limit it to those simple operations,
which are fast anyway.

                        Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 21:00:23 EST