Re: [PATCH] mask ADT: new mask.h file [2/22]

From: Paul Jackson
Date: Tue Apr 06 2004 - 23:00:40 EST


Rusty suggested:
> 1) I think you only want the fastpath when it's eliminated by the
> compiler, so perhaps:
> if (__builtin_constant_p(nbits) && nbits <= BITS_PER_LONG)

It's not quite a cut and dried decision.

Define a 'small' system to be one where NR_CPUS <= BITS_PER_LONG, and a
'large' system to be those that aren't small.

If one had a bitmap operator called in a performance critical path on a
'small' system with a non-constant (unknown to the optimizer) bitmap
size, then one would _not_ want the above check for builtin_constant.

Leaving out the buildin_constant check results in code that checks the
variable size at runtime, performs the fast inline instructions on a
single word if it fits, else jumps to an out-of-line block of code that
calls the real __bitmap_op() function.

Since on a 'small' system, the bitmap size probably fits in a word, and
since on a performance critical path, it's worth checking for the fast
inline instruction opportunity, avoiding a jump out of line and avoiding
a real function call, the code obtained by leaving out the check for
builtin_constant is ideal.

However ... it costs you a half dozen machine instructions of text
space, for the out-of-line code block to handle the slow case that calls
the real __bitmap_op() function.

Since almost all systems are small, and since almost all bitmap operations
are not on critical paths, and since almost all bitmap operations are
called with a constant bitmap size, these half dozen machine instructions
are almost never worth a slug of warm spit.

Better to do just as Rusty suggests - if called with a non-constant
size, just say screw it and call the __bitmap_op() routine. While that
call might not have been necessary (might even be an 'issue' in a
performance critical path), it's not worth the half-dozen machine
instructions to find out.

So I guess the real question is:

Is it worth the slug of warm source code spit, the
extra __builtin_constant_p(nbits) condition, to get rid
of these six instructions for each bitmap_op() call
made with a non-constant bitmap size?

Or the real real question - is this discussion worth a slug of warm
lkml posts ... ;)?

===

Aha - I just built an 8 CPU SMP i386 config with my latest stuff.
Exactly one call appears to any __bitmap_* function, in the routine
bitmap_parse() of lib/bitmap.c:

bitmap_shift_right(maskp, maskp, CHUNKSZ, nmaskbits);

Ok - change this to the following, and save six text instructions
testing for the possibility that nmaskbits < BITS_PER_LONG:

__bitmap_shift_right(maskp, maskp, CHUNKSZ, nmaskbits);

There .. now the inline bitmap_* operators are _always_ seeing
constant bitmap sizes (for this exhaustive sampling of size one ;).

Conclusions:

1) To heck with the extra __builtin_constant_p(nbits) condition.

It's not worth polluting the source code with stuff to futz
with a six instruction space/time tradeoff in some situation
that doesn't yet exist, and if it did exist, might or might
not want such a tradeoff.

2) No - this post wasn't worth a slug of warm spit ;).

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.650.933.1373
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/