Re: Faster make depend

Matti E Aarnio (
Sat, 27 Jul 1996 19:42:43 +0300 (EET DST)

Linus Torvalds wrote:
> On Fri, 26 Jul 1996 wrote:
> > Wonder why the Alpha doesn't have idivs, btw?
> Because it would mess up the integer pipeline, and the alpha designers felt
> (probably correctly) that integer divide wasn't important enough to make the
> pipelines more complex.

In GENERAL case the divide is an iterative operation.
FP division has to handle a set of exponent handling
tricks in addition to the division of the mantissa.

To speed up the division it is possible to build logic
to calculate more of the quotient bits at the division
step, however one can do faster iterative division with
ultra-fast multiplier :-)

For MULTIPLIERS there exist efficient (and old!) designs
from Seymor Cray, which are used widely in modern systems.
Multipliers are often doing it in one clock time at 30-60
MHz signal processors, for example.. Alphas do it with
two clocks in a pipeline mode.
(Sure the Wallace-tree flash-multiplier is an N**2 design,
but the popularity of multiplication does give reason to
invest at that amount of logic.)

In addition, calculating inverse ("1/x") of constant (at
compile time), and then doing multiplication with it is
(at Alphas) blinding fast... Sure it overflows, but the
result is correct for division :-) You can't get reminder
out from that method, though.

If you need a lot of divisions with same variable, you are
better of to calculate the modular inverse modulo 2**64
(well, register size handles the modulo operation :) )

> So idivs are done in software, with little (but as you found out, some)
> degradation. Thanks to that the rest of the chip is simpler and thus easier
> to do a fast implementation,

Indeed, and if you look closer at the fp-div, it stalls the
pipe for a long time, so one can't issue fp-divisions very
frequently -- was it 20-40 (input dependent) clocks ?

> Linus

/Matti Aarnio <>

PS: I recall once having benchmarked integer, and FP operations
at an IBM 3033H16, and were very surprised to find them to
be apparently of the same speed -- addition, and square-root
took same time! A closer look at the hardware did reveal
special square-root engine adjacent to the ALU unit.
And divisions were apparently equally fast...
(That system was top-of-the-line scalar engine at its time, so
no wonder it had such accelerators suitable for scientific,
and engineering applications...)