Re: MMX performance....

Robert Krawitz (
6 Feb 1997 18:46:56 GMT

In article <> Ingo Molnar <> writes:

On Thu, 6 Feb 1997, Dale R. Worley wrote:

> From what I understand, everytime you switch between MMX mode and regular
> FP mode, 100 or so cycles are burned. If you are context switching
> alot (any multitasking enviornment), this would seem to add up.
> Assuming that the "cycles" are fundamental CPU cycles (as opposed to
> memory accesses, or something), that could take 1 microsecond or less
> (depending on your clock speed), which isn't much. [...]

2.1.25 does a system call in 150 cycles and context switches in 190 cycles
microseconds. Wanna add 100 cycles to each memory copy operation?

100 cycles are alot. And XFree86 is rendering fonts using the FPU. And we
have the pentium memcpy patch which uses the FPU for 64 bit wide memory

Hmm. I haven't had a chance to look at the MMX instruction set, but
I'll be shocked, SHOCKED, if the MMX instruction set doesn't have 64
bit memory transfer instructions. Perhaps a logical alternative would
be to implement the Pentium memcpy in terms of whichever FPU/MMX mode
was in effect at the time.

The Pentium memcpy() patch, BTW, has a lot of overhead of its own; it
dumps and restores the FPU state (when it's in use it dumps the
registers; when not, it dumps just the rest of the state). That's why
it's configured to operate only when the amount of data to be copied
is large. The overhead is well worth it, though, since memory
bandwidth on write is used so much more efficiently.

Robert Krawitz <> 

Member of the League for Programming Freedom -- mail Tall Clubs International -- or 1-800-521-2512