Re: [cpuops cmpxchg V2 5/5] cpuops: Use cmpxchg for xchg to avoidlock semantics

From: Mathieu Desnoyers
Date: Tue Dec 14 2010 - 11:35:21 EST


* Christoph Lameter (cl@xxxxxxxxx) wrote:
> Use cmpxchg instead of xchg to realize this_cpu_xchg.
>
> xchg will cause LOCK overhead since LOCK is always implied but cmpxchg
> will not.
>
> Baselines:
>
> xchg() = 18 cycles (no segment prefix, LOCK semantics)
> __this_cpu_xchg = 1 cycle
>
> (simulated using this_cpu_read/write, two prefixes. Looks like the
> cpu can use loop optimization to get rid of most of the overhead)
>
> Cycles before:
>
> this_cpu_xchg = 37 cycles (segment prefix and LOCK (implied by xchg))
>
> After:
>
> this_cpu_xchg = 11 cycle (using cmpxchg without lock semantics)

Cool! Thanks for benchmarking these, it's really worth it.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>

>
> Signed-off-by: Christoph Lameter <cl@xxxxxxxxx>
>
> ---
> arch/x86/include/asm/percpu.h | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
>
> Index: linux-2.6/arch/x86/include/asm/percpu.h
> ===================================================================
> --- linux-2.6.orig/arch/x86/include/asm/percpu.h 2010-12-10 12:46:31.000000000 -0600
> +++ linux-2.6/arch/x86/include/asm/percpu.h 2010-12-10 13:25:21.000000000 -0600
> @@ -213,8 +213,9 @@ do { \
> })
>
> /*
> - * Beware: xchg on x86 has an implied lock prefix. There will be the cost of
> - * full lock semantics even though they are not needed.
> + * xchg is implemented using cmpxchg without a lock prefix. xchg is
> + * expensive due to the implied lock prefix. The processor cannot prefetch
> + * cachelines if xchg is used.
> */
> #define percpu_xchg_op(var, nval) \
> ({ \
> @@ -222,25 +223,33 @@ do { \
> typeof(var) __new = (nval); \
> switch (sizeof(var)) { \
> case 1: \
> - asm("xchgb %2, "__percpu_arg(1) \
> + asm("\n1:mov "__percpu_arg(1)",%%al" \
> + "\n\tcmpxchgb %2, "__percpu_arg(1) \
> + "\n\tjnz 1b" \
> : "=a" (__ret), "+m" (var) \
> : "q" (__new) \
> : "memory"); \
> break; \
> case 2: \
> - asm("xchgw %2, "__percpu_arg(1) \
> + asm("\n1:mov "__percpu_arg(1)",%%ax" \
> + "\n\tcmpxchgw %2, "__percpu_arg(1) \
> + "\n\tjnz 1b" \
> : "=a" (__ret), "+m" (var) \
> : "r" (__new) \
> : "memory"); \
> break; \
> case 4: \
> - asm("xchgl %2, "__percpu_arg(1) \
> + asm("\n1:mov "__percpu_arg(1)",%%eax" \
> + "\n\tcmpxchgl %2, "__percpu_arg(1) \
> + "\n\tjnz 1b" \
> : "=a" (__ret), "+m" (var) \
> : "r" (__new) \
> : "memory"); \
> break; \
> case 8: \
> - asm("xchgq %2, "__percpu_arg(1) \
> + asm("\n1:mov "__percpu_arg(1)",%%rax" \
> + "\n\tcmpxchgq %2, "__percpu_arg(1) \
> + "\n\tjnz 1b" \
> : "=a" (__ret), "+m" (var) \
> : "r" (__new) \
> : "memory"); \
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/