Re: IP Checksumming

Tom May (ftom@netcom.com)
21 Nov 1996 11:24:24 -0800


"Richard B. Johnson" <root@analogic.com> writes:

> If the loop is within the cache-line there is no penalty. The timing
> is as specified in the manual. By the time the loop instruction is
> executed the second time all code will be within the cache even if
> there was as penality on the first execution.

Explain why my implementation of your code is not as fast as you say
it should be. Also note that the timings in the manual are very slow:
lodsw takes 5 cycles on a 486 + one cycle for the size prefix.

> > You mean the lodsw and loop instructions? Those have been losers
> > since the 386.
>
> Not true. These built-in macros are responsible for much of the performance
> improvements over chips such as the 68k providing the developer took the
> time to use them.

The risciness of the 486 architecture is responsible for much of the
performance improvements over the 386. Combinations of simple single
cycle instructions execute more quickly with greater flexibility than
the complex instructions which use dedicated registers. Intel
themselves point this out in Appendix G of the i486 Microprocessor
Programmer's Reference Manual.

> I think there is a problem with testing the speed of the checksum
> routines because changes in execution speed are way down in the
> noise level when you test "networking" results.

That's why I don't test networking results. I test this stuff out of
the kernel. I use the times() function on a nearly idle system and
get very consistent results.

> The fact that you measure an order-of-magnitude difference
> between your routines and the usual, which I have used for many years,
> means that either something is wrong with the measurement or something
> is wrong with the hardware.

Just because you've used your code for many years is no reason to hold
it above suspicion.

Would you consider performance testing the Linux code with your
hardware if I converted it to Intel format?

Tom.