Re: MMX performance....

Oliver Xymoron (
Thu, 6 Feb 1997 16:21:48 -0600 (CST)

On 6 Feb 1997, Robert Krawitz wrote:

> In article <> Ingo Molnar <> writes:
> On Thu, 6 Feb 1997, Dale R. Worley wrote:
> > From what I understand, everytime you switch between MMX mode and regular
> > FP mode, 100 or so cycles are burned. If you are context switching
> > alot (any multitasking enviornment), this would seem to add up.
> >
> > Assuming that the "cycles" are fundamental CPU cycles (as opposed to
> > memory accesses, or something), that could take 1 microsecond or less
> > (depending on your clock speed), which isn't much. [...]
> 2.1.25 does a system call in 150 cycles and context switches in 190 cycles
> microseconds. Wanna add 100 cycles to each memory copy operation?
> 100 cycles are alot. And XFree86 is rendering fonts using the FPU. And we
> have the pentium memcpy patch which uses the FPU for 64 bit wide memory
> copy.
> Hmm. I haven't had a chance to look at the MMX instruction set, but
> I'll be shocked, SHOCKED, if the MMX instruction set doesn't have 64
> bit memory transfer instructions. Perhaps a logical alternative would
> be to implement the Pentium memcpy in terms of whichever FPU/MMX mode
> was in effect at the time.

I'd be shocked as well. Most of the core instruction times are listed as
1, and from what I can tell, it's just a hack on the already existing
functional units in the FPU, taking advantage of the fast multiplier, etc.

> The Pentium memcpy() patch, BTW, has a lot of overhead of its own; it
> dumps and restores the FPU state (when it's in use it dumps the
> registers; when not, it dumps just the rest of the state). That's why
> it's configured to operate only when the amount of data to be copied
> is large. The overhead is well worth it, though, since memory
> bandwidth on write is used so much more efficiently.

What's the break-even copy size? Your patch seems to suggest 512 or 1024

 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."