Re: [PATCH] - Fix memory ordering problem in wake_futex()

From: Joe Seigh
Date: Fri Dec 23 2005 - 12:03:38 EST


Jack Steiner wrote:
Here is a fix for a ugly race condition that occurs in wake_futex() on IA64.

On IA64, locks are released using a "st.rel" instruction. This ensures that
preceding "stores" are visible before the lock is released but does NOT prevent
a "store" that follows the "st.rel" from becoming visible before the "st.rel".
The result is that the task that owns the futex_q continues prematurely.

The failure I saw is the task that owned the futex_q resumed prematurely and
was context-switch off of the cpu. The task's switch_stack occupied the same
space of the futex_q. The store to q->lock_ptr overwrote the ar.bspstore in the
switch_stack. When the task resumed, it ran with a corrupted ar.bspstore.
Things went downhill from there.

Without the fix, the application fails roughly every 10 minutes. With
the fix, it ran 16 hours without a failure.


I'm not sure I understand. Mutex semantics allow for memory accesses to
be moved into the critical region but not vice versa. This is true for Java
and it's pretty much agreed by all the "experts" that it's allowed in Posix
if there was such a thing as a formal definition of mutex semantics
in Posix. It would also seem to be to be the reason why Intel designed the
release semantics that way, so the hardware, not just the compiler, could
move code into the critical region for better performance.

So I suspect the problem is something else.


--
Joe Seigh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/