[PATCH] x86/uaccess: use unrolled string copy for short strings

From: Paolo Abeni
Date: Wed Jun 21 2017 - 07:13:41 EST


The 'rep' prefix suffers for a relevant "setup cost"; as a result
string copies with unrolled loops are faster than even
optimized string copy using 'rep' variant, for short string.

This change updates __copy_user_generic() to use the unrolled
version for small string length. The threshold length for short
string - 64 - has been selected with empirical measures as the
larger value that still ensure a measurable gain.

A micro-benchmark of __copy_from_user() with different lengths shows
the following:

string len vanilla patched delta
bytes ticks ticks tick(%)

0 58 26 32(55%)
1 49 29 20(40%)
2 49 31 18(36%)
3 49 32 17(34%)
4 50 34 16(32%)
5 49 35 14(28%)
6 49 36 13(26%)
7 49 38 11(22%)
8 50 31 19(38%)
9 51 33 18(35%)
10 52 36 16(30%)
11 52 37 15(28%)
12 52 38 14(26%)
13 52 40 12(23%)
14 52 41 11(21%)
15 52 42 10(19%)
16 51 34 17(33%)
17 51 35 16(31%)
18 52 37 15(28%)
19 51 38 13(25%)
20 52 39 13(25%)
21 52 40 12(23%)
22 51 42 9(17%)
23 51 46 5(9%)
24 52 35 17(32%)
25 52 37 15(28%)
26 52 38 14(26%)
27 52 39 13(25%)
28 52 40 12(23%)
29 53 42 11(20%)
30 52 43 9(17%)
31 52 44 8(15%)
32 51 36 15(29%)
33 51 38 13(25%)
34 51 39 12(23%)
35 51 41 10(19%)
36 52 41 11(21%)
37 52 43 9(17%)
38 51 44 7(13%)
39 52 46 6(11%)
40 51 37 14(27%)
41 50 38 12(24%)
42 50 39 11(22%)
43 50 40 10(20%)
44 50 42 8(16%)
45 50 43 7(14%)
46 50 43 7(14%)
47 50 45 5(10%)
48 50 37 13(26%)
49 49 38 11(22%)
50 50 40 10(20%)
51 50 42 8(16%)
52 50 42 8(16%)
53 49 46 3(6%)
54 50 46 4(8%)
55 49 48 1(2%)
56 50 39 11(22%)
57 50 40 10(20%)
58 49 42 7(14%)
59 50 42 8(16%)
60 50 46 4(8%)
61 50 47 3(6%)
62 50 48 2(4%)
63 50 48 2(4%)
64 51 38 13(25%)

Above 64 bytes the gain fades away.

Very similar values are collectd for __copy_to_user().
UDP receive performances under flood with small packets using recvfrom()
increase by ~5%.

Signed-off-by: Paolo Abeni <pabeni@xxxxxxxxxx>
---
arch/x86/include/asm/uaccess_64.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index c5504b9..16a8871 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -28,6 +28,9 @@ copy_user_generic(void *to, const void *from, unsigned len)
{
unsigned ret;

+ if (len <= 64)
+ return copy_user_generic_unrolled(to, from, len);
+
/*
* If CPU has ERMS feature, use copy_user_enhanced_fast_string.
* Otherwise, if CPU has rep_good feature, use copy_user_generic_string.
--
2.9.4