Re: AMD optimizations?

Alan Cox (alan@lxorguk.ukuu.org.uk)
Thu, 26 Aug 1999 20:21:53 +0100 (BST)


> > I looked at it - its hard, very hard. MMX has no carry bit so you end up
> > having to do a cycle of
> >
> > copy original 4 words
> > add 4 new words
> > for each new word < old word
> > add one
>
> As I pointed out a long time ago, you can do the whole thing 32-bits wide
> and correct it at the end. No conditionals or carries required in the main
> loop.

You mean pmov 8 bytes in word->long expand the 16 bit chunks into a pair
of 64 bit registers and then do a pair of adds. That might be fractionally
faster but remembr the if / add 1 stuff involves no jumps because you can
use the mmx conditional, an and operation and an add.

Also by expanding it you need more registers. You will need two for the
sum, two for the expanded value, and one for the input, which means you can't
have two sets of interleaved loops ?

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/