Re: [RFC] x86-64: Use SSE for copy_page and clear_page

From: Michael Thonke
Date: Mon May 30 2005 - 15:43:51 EST


Benjamin LaHaise schrieb:

>On Mon, May 30, 2005 at 10:05:28PM +0200, Michael Thonke wrote:
>
>
>>No it doesn't like this sample here at all,I'll get segmentationfault on
>>that run.
>>
>>
>
>Grab a new copy -- one of the routines had an unaligned store instead of
>aligned for the register save.
>
> -ben
>
>
>
Hi Benjamin,

Here are the results with the new copy.

*RUN 1: cc -o xmm64.o xmm64.c*

ioGL64NX_EMT64 ~ # ./xmm64.o
SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13632 cycles per page
clear_page function 'kernel clear' took 6599 cycles per page
clear_page function '2.4 non MMX' took 6482 cycles per page
clear_page function '2.4 MMX fallback' took 6367 cycles per page
clear_page function '2.4 MMX version' took 6644 cycles per page
clear_page function 'faster_clear_page' took 6088 cycles per page
clear_page function 'even_faster_clear' took 5692 cycles per page
clear_page function 'xmm_clear' took 4270 cycles per page
clear_page function 'xmma_clear' took 6351 cycles per page
clear_page function 'xmm2_clear' took 4710 cycles per page
clear_page function 'xmma2_clear' took 6198 cycles per page
clear_page function 'xmm3_clear' took 6583 cycles per page
clear_page function 'nt clear ' took 4746 cycles per page
clear_page function 'kernel clear' took 6158 cycles per page

copy_page() tests
copy_page function 'warm up run' took 9210 cycles per page
copy_page function '2.4 non MMX' took 6740 cycles per page
copy_page function '2.4 MMX fallback' took 6697 cycles per page
copy_page function '2.4 MMX version' took 9178 cycles per page
copy_page function 'faster_copy' took 11360 cycles per page
copy_page function 'even_faster' took 10133 cycles per page
copy_page function 'xmm_copy_page_no' took 8885 cycles per page
copy_page function 'xmm_copy_page' took 8725 cycles per page
copy_page function 'xmma_copy_page' took 9964 cycles per page
copy_page function 'xmm3_copy_page' took 7176 cycles per page
copy_page function 'v26_copy_page' took 6879 cycles per page
copy_page function 'nt_copy_page' took 10858 cycles per page


*RUN 2: gcc -o xmm64.o xmm64.c*

ioGL64NX_EMT64 ~ # ./xmm64.o
SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13981 cycles per page
clear_page function 'kernel clear' took 6708 cycles per page
clear_page function '2.4 non MMX' took 6505 cycles per page
clear_page function '2.4 MMX fallback' took 6235 cycles per page
clear_page function '2.4 MMX version' took 7251 cycles per page
clear_page function 'faster_clear_page' took 6390 cycles per page
clear_page function 'even_faster_clear' took 5932 cycles per page
clear_page function 'xmm_clear' took 4876 cycles per page
clear_page function 'xmma_clear' took 6379 cycles per page
clear_page function 'xmm2_clear' took 5264 cycles per page
clear_page function 'xmma2_clear' took 6373 cycles per page
clear_page function 'xmm3_clear' took 6651 cycles per page
clear_page function 'nt clear ' took 5186 cycles per page
clear_page function 'kernel clear' took 6326 cycles per page

copy_page() tests
copy_page function 'warm up run' took 9537 cycles per page
copy_page function '2.4 non MMX' took 6776 cycles per page
copy_page function '2.4 MMX fallback' took 7407 cycles per page
copy_page function '2.4 MMX version' took 8812 cycles per page
copy_page function 'faster_copy' took 10992 cycles per page
copy_page function 'even_faster' took 10232 cycles per page
copy_page function 'xmm_copy_page_no' took 8918 cycles per page
copy_page function 'xmm_copy_page' took 9579 cycles per page
copy_page function 'xmma_copy_page' took 9854 cycles per page
copy_page function 'xmm3_copy_page' took 7602 cycles per page
copy_page function 'v26_copy_page' took 6811 cycles per page
copy_page function 'nt_copy_page' took 10958 cycles per page

*RUN 3: gcc -pipe -march=nocona -O2 -o xmm64.o xmm64.c
*
SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13626 cycles per page
clear_page function 'kernel clear' took 6780 cycles per page
clear_page function '2.4 non MMX' took 6755 cycles per page
clear_page function '2.4 MMX fallback' took 6283 cycles per page
clear_page function '2.4 MMX version' took 6764 cycles per page
clear_page function 'faster_clear_page' took 5764 cycles per page
clear_page function 'even_faster_clear' took 5240 cycles per page
clear_page function 'xmm_clear' took 4532 cycles per page
clear_page function 'xmma_clear' took 6352 cycles per page
clear_page function 'xmm2_clear' took 4983 cycles per page
clear_page function 'xmma2_clear' took 6211 cycles per page
clear_page function 'xmm3_clear' took 6748 cycles per page
clear_page function 'nt clear ' took 5166 cycles per page
clear_page function 'kernel clear' took 6201 cycles per page

copy_page() tests
copy_page function 'warm up run' took 9651 cycles per page
copy_page function '2.4 non MMX' took 6724 cycles per page
copy_page function '2.4 MMX fallback' took 6905 cycles per page
copy_page function '2.4 MMX version' took 9722 cycles per page
copy_page function 'faster_copy' took 9738 cycles per page
copy_page function 'even_faster' took 9609 cycles per page
copy_page function 'xmm_copy_page_no' took 8846 cycles per page
copy_page function 'xmm_copy_page' took 8591 cycles per page
copy_page function 'xmma_copy_page' took 8250 cycles per page
copy_page function 'xmm3_copy_page' took 7879 cycles per page
copy_page function 'v26_copy_page' took 7512 cycles per page
copy_page function 'nt_copy_page' took 10424 cycles per page

RUN 4: *gcc -pipe -march=nocona -O2 -fPIC -o xmm64.o xmm64.c*

SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13713 cycles per page
clear_page function 'kernel clear' took 6655 cycles per page
clear_page function '2.4 non MMX' took 6448 cycles per page
clear_page function '2.4 MMX fallback' took 6270 cycles per page
clear_page function '2.4 MMX version' took 7001 cycles per page
clear_page function 'faster_clear_page' took 5671 cycles per page
clear_page function 'even_faster_clear' took 5366 cycles per page
clear_page function 'xmm_clear' took 4737 cycles per page
clear_page function 'xmma_clear' took 6464 cycles per page
clear_page function 'xmm2_clear' took 5214 cycles per page
clear_page function 'xmma2_clear' took 6371 cycles per page
clear_page function 'xmm3_clear' took 6660 cycles per page
clear_page function 'nt clear ' took 5066 cycles per page
clear_page function 'kernel clear' took 6314 cycles per page

copy_page() tests
copy_page function 'warm up run' took 9464 cycles per page
copy_page function '2.4 non MMX' took 7179 cycles per page
copy_page function '2.4 MMX fallback' took 6928 cycles per page
copy_page function '2.4 MMX version' took 9091 cycles per page
copy_page function 'faster_copy' took 9996 cycles per page
copy_page function 'even_faster' took 9824 cycles per page
copy_page function 'xmm_copy_page_no' took 8724 cycles per page
copy_page function 'xmm_copy_page' took 8920 cycles per page
copy_page function 'xmma_copy_page' took 8859 cycles per page
copy_page function 'xmm3_copy_page' took 7794 cycles per page
copy_page function 'v26_copy_page' took 7808 cycles per page
copy_page function 'nt_copy_page' took 9264 cycles per page

Do you need more results or tests Benjamin?

Greets and best regards
Michael

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/