Re: [PATCH] x86_64/lib: improve the performance of memmove

From: Miao Xie
Date: Thu Sep 16 2010 - 05:29:33 EST


On Thu, 16 Sep 2010 10:40:08 +0200, Andi Kleen wrote:
On Thu, 16 Sep 2010 15:16:31 +0800
Miao Xie<miaox@xxxxxxxxxxxxxx> wrote:

On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote:
When the dest and the src do overlap and the memory area is large,
memmove of
x86_64 is very inefficient, and it led to bad performance, such as
btrfs's file
deletion performance. This patch improved the performance of
memmove on x86_64
by using __memcpy_bwd() instead of byte copy when doing large
memory area copy
(len> 64).


I still don't understand why you don't simply use a backwards
string copy (with std) ? That should be much simpler and
hopefully be as optimized for kernel copies on recent CPUs.

But according to the comment of memcpy, some CPUs don't support "REP"
instruction,

I think you misread the comment. REP prefixes are in all x86 CPUs.
On some very old systems it wasn't optimized very well,
but it probably doesn't make too much sense to optimize for those.
On newer CPUs in fact REP should be usually faster than
an unrolled loop.

I'm not sure how optimized the backwards copy is, but most likely
it is optimized too.

Here's an untested patch that implements backwards copy with string
instructions. Could you run it through your test harness?

Ok, I'll do it.

+
+/*
+ * Copy memory backwards (for memmove)
+ * rdi target
+ * rsi source
+ * rdx count
+ */
+
+ENTRY(memcpy_backwards):

s/://

+ CFI_STARTPROC
+ std
+ movq %rdi, %rax
+ movl %edx, %ecx
+ add %rdx, %rdi
+ add %rdx, %rsi

- add %rdx, %rdi
- add %rdx, %rsi
+ addq %rdx, %rdi
+ addq %rdx, %rsi

Besides that, the address that %rdi/%rsi pointed to is over the end of
mempry area that going to be copied, so we must tune the address to be
correct.
+ leaq -8(%rdi), %rdi
+ leaq -8(%rsi), %rsi

+ shrl $3, %ecx
+ andl $7, %edx
+ rep movsq

The same as above.
+ leaq 8(%rdi), %rdi
+ leaq 8(%rsi), %rsi
+ decq %rsi
+ decq %rdi

+ movl %edx, %ecx
+ rep movsb
+ cld
+ ret
+ CFI_ENDPROC
+ENDPROC(memcpy_backwards)
+
diff --git a/arch/x86/lib/memmove_64.c b/arch/x86/lib/memmove_64.c
index 0a33909..6c00304 100644
--- a/arch/x86/lib/memmove_64.c
+++ b/arch/x86/lib/memmove_64.c
@@ -5,16 +5,16 @@
#include<linux/string.h>
#include<linux/module.h>

+extern void asmlinkage memcpy_backwards(void *dst, const void *src,
+ size_t count);

The type of the return value must be "void *".

Thanks
Miao

+
#undef memmove
void *memmove(void *dest, const void *src, size_t count)
{
if (dest< src) {
return memcpy(dest, src, count);
} else {
- char *p = dest + count;
- const char *s = src + count;
- while (count--)
- *--p = *--s;
+ return memcpy_backwards(dest, src, count);
}
return dest;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/