Re: [patch 00/21] mutex subsystem, -V14

From: Joel Schopp
Date: Thu Jan 05 2006 - 18:05:26 EST


ISYNC_ON_SMP flushes all speculative reads currently in the queue - and is hence a smp_rmb_backwards() primitive [per my previous mail] - but does not affect writes - correct?

if that's the case, what prevents a store from within the critical section going up to right after the EIEIO_ON_SMP, but before the atomic-dec instructions? Does any of those instructions imply some barrier perhaps? Are writes always ordered perhaps (like on x86 CPUs), and hence the store before the bne is an effective write-barrier?

It really makes more sense after reading PowerPC Book II, which you can find at this link, it was written by people who explain this for a living: http://www-128.ibm.com/developerworks/eserver/articles/archguide.html

While isync technically doesn't order stores it does order instructions. The previous bne- must complete, that bne- is dependent on the previous stwcx being complete. So no stores are slipping up. To get a better explanation you will have to read the document yourself.

Here is a first pass at a powerpc file for the fast paths just as an FYI/RFC. It is completely untested, but compiles.

Signed-off-by: Joel Schopp <jschopp@xxxxxxxxxxxxxx>



Index: 2.6.15-mutex14/include/asm-powerpc/mutex.h
===================================================================
--- 2.6.15-mutex14.orig/include/asm-powerpc/mutex.h 2006-01-04 14:46:31.%N -0600
+++ 2.6.15-mutex14/include/asm-powerpc/mutex.h 2006-01-05 16:25:41.%N -0600
@@ -1,9 +1,83 @@
/*
- * Pull in the generic implementation for the mutex fastpath.
+ * include/asm-powerpc/mutex.h
*
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
+ * PowerPC optimized mutex locking primitives
+ *
+ * Please look into asm-generic/mutex-xchg.h for a formal definition.
+ * Copyright (C) 2006 Joel Schopp <jschopp@xxxxxxxxxxxxxx>, IBM
*/
+#ifndef _ASM_MUTEX_H
+#define _ASM_MUTEX_H
+#define __mutex_fastpath_lock(count, fail_fn)\
+do{ \
+ long tmp; \
+ __asm__ __volatile__( \
+"1: lwarx %0,0,%1\n" \
+" addic %0,%0,-1\n" \
+" stwcx. %0,0,%1\n" \
+" bne- 1b\n" \
+" isync \n" \
+ : "=&r" (tmp) \
+ : "r" (&(count)->counter) \
+ : "cr0", "memory"); \
+ if (unlikely(tmp < 0)) \
+ fail_fn(count); \
+} while (0)
+
+#define __mutex_fastpath_unlock(count, fail_fn)\
+do{ \
+ long tmp; \
+ __asm__ __volatile__(SYNC_ON_SMP \
+"1: lwarx %0,0,%1\n" \
+" addic %0,%0,1\n" \
+" stwcx. %0,0,%1\n" \
+" bne- 1b\n" \
+ : "=&r" (tmp) \
+ : "r" (&(count)->counter) \
+ : "cr0", "memory"); \
+ if (unlikely(tmp <= 0)) \
+ fail_fn(count); \
+} while (0)
+
+
+static inline int
+__mutex_fastpath_trylock(atomic_t* count, int (*fail_fn)(atomic_t*))
+{
+ long tmp;
+ __asm__ __volatile__(
+"1: lwarx %0,0,%1\n"
+" cmpwi 0,%0,1\n"
+" bne- 2f\n"
+" stwcx. %0,0,%1\n"
+" bne- 1b\n"
+" isync\n"
+"2:"
+ : "=&r" (tmp)
+ : "r" (&(count)->counter)
+ : "cr0", "memory");
+
+ return (int)tmp;
+
+}
+
+#define __mutex_slowpath_needs_to_unlock() 1

-#include <asm-generic/mutex-dec.h>
+static inline int
+__mutex_fastpath_lock_retval(atomic_t* count, int (*fail_fn)(atomic_t *))
+{
+ long tmp;
+ __asm__ __volatile__(
+"1: lwarx %0,0,%1\n"
+" addic %0,%0,-1\n"
+" stwcx. %0,0,%1\n"
+" bne- 1b\n"
+" isync \n"
+ : "=&r" (tmp)
+ : "r" (&(count)->counter)
+ : "cr0", "memory");
+ if (unlikely(tmp < 0))
+ return fail_fn(count);
+ else
+ return 0;
+}
+#endif