Re: AMD optimizations?

Oliver Xymoron (oxymoron@waste.org)
Thu, 26 Aug 1999 14:59:59 -0500 (CDT)


On Thu, 26 Aug 1999, Alan Cox wrote:

> You mean pmov 8 bytes in word->long expand the 16 bit chunks into a pair
> of 64 bit registers and then do a pair of adds. That might be fractionally
> faster but remembr the if / add 1 stuff involves no jumps because you can
> use the mmx conditional, an and operation and an add.
>
> Also by expanding it you need more registers. You will need two for the
> sum, two for the expanded value, and one for the input, which means you can't
> have two sets of interleaved loops ?

Why do you need two registers for the sum or two scratch registers? I
haven't looked at the instruction set since the first MMX chips came out
but I seem to remember you could do:

64 bit read to reg0
Unpack lower words of reg0 to 32-bit values in reg1
Add reg1 to reg2
Unpack upper reg0 to reg1
Add reg1 to reg2

If that doesn't pipeline well, or if you get a benefit out of doing two
sets of 8 words in parallel, there are a number of ways to shuffle things
around.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/