Re: [PATCH] x86_64/lib: improve the performance of memmove

From: Miao Xie
Date: Thu Sep 16 2010 - 07:47:30 EST


On Thu, 16 Sep 2010 18:47:59 +0800
, Miao Xie wrote:
On Thu, 16 Sep 2010 12:11:41 +0200, Andi Kleen wrote:
On Thu, 16 Sep 2010 17:29:32 +0800
Miao Xie<miaox@xxxxxxxxxxxxxx> wrote:


Ok was a very broken patch. Sorry should have really done some more
work on it. Anyways hopefully the corrected version is good for
testing.

-Andi

The test result is following:
Len Src Unalign Dest Unalign Patch applied Without Patch
--- ----------- ------------ ------------- -------------
8 0 0 0s 421117us 0s 70203us
8 0 3 0s 252622us 0s 42114us
8 0 7 0s 252663us 0s 42111us
8 3 0 0s 252666us 0s 42114us
8 3 3 0s 252667us 0s 42113us
8 3 7 0s 252667us 0s 42112us
32 0 0 0s 252672us 0s 114301us
32 0 3 0s 252676us 0s 114306us
32 0 7 0s 252663us 0s 114300us
32 3 0 0s 252661us 0s 114305us
32 3 3 0s 252663us 0s 114300us
32 3 7 0s 252668us 0s 114304us
64 0 0 0s 252672us 0s 236119us
64 0 3 0s 264671us 0s 236120us
64 0 7 0s 264702us 0s 236127us
64 3 0 0s 270701us 0s 236128us
64 3 3 0s 287236us 0s 236809us
64 3 7 0s 287257us 0s 236123us

According to the above result, old version is better than the new one when the
memory area is small.

Len Src Unalign Dest Unalign Patch applied Without Patch
--- ----------- ------------ ------------- -------------
256 0 0 0s 281886us 0s 813660us
256 0 3 0s 332169us 0s 813645us
256 0 7 0s 342961us 0s 813639us
256 3 0 0s 305305us 0s 813634us
256 3 3 0s 386939us 0s 813638us
256 3 7 0s 370511us 0s 814335us
512 0 0 0s 310716us 1s 584677us
512 0 3 0s 456420us 1s 583353us
512 0 7 0s 468236us 1s 583248us
512 3 0 0s 493987us 1s 583659us
512 3 3 0s 588041us 1s 584294us
512 3 7 0s 605489us 1s 583650us
1024 0 0 0s 406971us 3s 123644us
1024 0 3 0s 748419us 3s 126514us
1024 0 7 0s 756153us 3s 127178us
1024 3 0 0s 854681us 3s 130013us
1024 3 3 1s 46828us 3s 140190us
1024 3 7 1s 35886us 3s 135508us

the new version is better when the memory area is large.

Thanks!
Miao



title: x86_64/lib: improve the performance of memmove

Implement the 64bit memmmove backwards case using string instructions

Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Signed-off-by: Miao Xie <miaox@xxxxxxxxxxxxxx>
---
arch/x86/lib/memcpy_64.S | 29 +++++++++++++++++++++++++++++
arch/x86/lib/memmove_64.c | 8 ++++----
2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index bcbcd1e..9de5e9a 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -141,3 +141,32 @@ ENDPROC(__memcpy)
.byte .Lmemcpy_e - .Lmemcpy_c
.byte .Lmemcpy_e - .Lmemcpy_c
.previous
+
+/*
+ * Copy memory backwards (for memmove)
+ * rdi target
+ * rsi source
+ * rdx count
+ */
+
+ENTRY(memcpy_backwards)
+ CFI_STARTPROC
+ std
+ movq %rdi, %rax
+ movl %edx, %ecx
+ addq %rdx, %rdi
+ addq %rdx, %rsi
+ leaq -8(%rdi), %rdi
+ leaq -8(%rsi), %rsi
+ shrl $3, %ecx
+ andl $7, %edx
+ rep movsq
+ addq $7, %rdi
+ addq $7, %rsi
+ movl %edx, %ecx
+ rep movsb
+ cld
+ ret
+ CFI_ENDPROC
+ENDPROC(memcpy_backwards)
+
diff --git a/arch/x86/lib/memmove_64.c b/arch/x86/lib/memmove_64.c
index 0a33909..6774fd8 100644
--- a/arch/x86/lib/memmove_64.c
+++ b/arch/x86/lib/memmove_64.c
@@ -5,16 +5,16 @@
#include <linux/string.h>
#include <linux/module.h>

+extern void * asmlinkage memcpy_backwards(void *dst, const void *src,
+ size_t count);
+
#undef memmove
void *memmove(void *dest, const void *src, size_t count)
{
if (dest < src) {
return memcpy(dest, src, count);
} else {
- char *p = dest + count;
- const char *s = src + count;
- while (count--)
- *--p = *--s;
+ return memcpy_backwards(dest, src, count);
}
return dest;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/