Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

From: Kees Cook
Date: Wed Jun 21 2017 - 13:38:58 EST


On Wed, Jun 21, 2017 at 4:09 AM, Paolo Abeni <pabeni@xxxxxxxxxx> wrote:
> The 'rep' prefix suffers for a relevant "setup cost"; as a result
> string copies with unrolled loops are faster than even
> optimized string copy using 'rep' variant, for short string.
>
> This change updates __copy_user_generic() to use the unrolled
> version for small string length. The threshold length for short
> string - 64 - has been selected with empirical measures as the
> larger value that still ensure a measurable gain.
>
> A micro-benchmark of __copy_from_user() with different lengths shows
> the following:
>
> string len vanilla patched delta
> bytes ticks ticks tick(%)
>
> 0 58 26 32(55%)
> 1 49 29 20(40%)
> 2 49 31 18(36%)
> 3 49 32 17(34%)
> 4 50 34 16(32%)
> 5 49 35 14(28%)
> 6 49 36 13(26%)
> 7 49 38 11(22%)
> 8 50 31 19(38%)
> 9 51 33 18(35%)
> 10 52 36 16(30%)
> 11 52 37 15(28%)
> 12 52 38 14(26%)
> 13 52 40 12(23%)
> 14 52 41 11(21%)
> 15 52 42 10(19%)
> 16 51 34 17(33%)
> 17 51 35 16(31%)
> 18 52 37 15(28%)
> 19 51 38 13(25%)
> 20 52 39 13(25%)
> 21 52 40 12(23%)
> 22 51 42 9(17%)
> 23 51 46 5(9%)
> 24 52 35 17(32%)
> 25 52 37 15(28%)
> 26 52 38 14(26%)
> 27 52 39 13(25%)
> 28 52 40 12(23%)
> 29 53 42 11(20%)
> 30 52 43 9(17%)
> 31 52 44 8(15%)
> 32 51 36 15(29%)
> 33 51 38 13(25%)
> 34 51 39 12(23%)
> 35 51 41 10(19%)
> 36 52 41 11(21%)
> 37 52 43 9(17%)
> 38 51 44 7(13%)
> 39 52 46 6(11%)
> 40 51 37 14(27%)
> 41 50 38 12(24%)
> 42 50 39 11(22%)
> 43 50 40 10(20%)
> 44 50 42 8(16%)
> 45 50 43 7(14%)
> 46 50 43 7(14%)
> 47 50 45 5(10%)
> 48 50 37 13(26%)
> 49 49 38 11(22%)
> 50 50 40 10(20%)
> 51 50 42 8(16%)
> 52 50 42 8(16%)
> 53 49 46 3(6%)
> 54 50 46 4(8%)
> 55 49 48 1(2%)
> 56 50 39 11(22%)
> 57 50 40 10(20%)
> 58 49 42 7(14%)
> 59 50 42 8(16%)
> 60 50 46 4(8%)
> 61 50 47 3(6%)
> 62 50 48 2(4%)
> 63 50 48 2(4%)
> 64 51 38 13(25%)
>
> Above 64 bytes the gain fades away.
>
> Very similar values are collectd for __copy_to_user().
> UDP receive performances under flood with small packets using recvfrom()
> increase by ~5%.
>
> Signed-off-by: Paolo Abeni <pabeni@xxxxxxxxxx>

Since there are no regressions here, this seems sensible to me. :)

Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>

-Kees

> ---
> arch/x86/include/asm/uaccess_64.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
> index c5504b9..16a8871 100644
> --- a/arch/x86/include/asm/uaccess_64.h
> +++ b/arch/x86/include/asm/uaccess_64.h
> @@ -28,6 +28,9 @@ copy_user_generic(void *to, const void *from, unsigned len)
> {
> unsigned ret;
>
> + if (len <= 64)
> + return copy_user_generic_unrolled(to, from, len);
> +
> /*
> * If CPU has ERMS feature, use copy_user_enhanced_fast_string.
> * Otherwise, if CPU has rep_good feature, use copy_user_generic_string.
> --
> 2.9.4
>



--
Kees Cook
Pixel Security