RE: AMD optimizations?

Gaddoni Marco (marco.gaddoni@imola.nettuno.it)
Fri, 27 Aug 1999 08:14:34 +0200


Alan Cox Wrote:
>The intel cpus have fastpaths for rep movs so for everything but the 486
>and some pentiums it seems that the 386 code is fastest. On a K6-II I can't
>beat rep movs for memory originated reads. Im not sure about cached stuff.

I can get improved perfomance from my k6-II 375/75 (overclocked) with this
code for zero page:

asm __volatile__ ( \
" prefetchw (%0); \n" \
" movl $0xff, %1 \n" \
" xorl %2,%2 \n" \
"1: movl %2,(%0) \n" \
" prefetchw 0x10(%0) \n" \
" movl %2,0x4(%0) \n" \
" movl %2,0x8(%0) \n" \
" movl %2,0xc(%0) \n" \
" addl $0x10,%0 \n" \
" decl %1 \n" \
" jns 1b " \
:"=r" (page), "=r" (tmp1), "=r" (tmp2) : "0" (page): "memory" );

I can clean a page not in cache (if i got my benchmark correct) in 17690
clock cycles (measured with rdtsc) vs. 21224 for rep stol ...

This is the tree register version. The speed is the same if you use a
movl $0x0,(%0) in place of movl %2,(%0) but the icache footprint is smaller.

-- 
This is not a Sig. (With homage to Magritte).

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/