[PATCH 5/5] futex: fix wakeup race by setting TASK_INTERRUPTIBLEbefore queue_me

From: Darren Hart
Date: Tue Sep 22 2009 - 01:30:55 EST


PI futexes do not use the same plist_node_empty() test for wakeup. It was
possible for the waiter (in futex_wait_requeue_pi()) to set TASK_INTERRUPTIBLE
after the waker assigned the rtmutex to the waiter. The waiter would then note
the plist was not empty and call schedule(). The task would not be found by any
subsequeuent futex wakeups, resulting in a userspace hang. By moving the
setting of TASK_INTERRUPTIBLE to before the call to queue_me(), the race with
the waker is eliminated. Since we no longer call get_user() from within
queue_me(), there is no need to delay the setting of TASK_INTERRUPTIBLE until
after the call to queue_me().

The FUTEX_LOCK_PI operation is not affected as futex_lock_pi() relies entirely
on the rtmutex code to handle schedule() and wakeup. The requeue PI code is
affected because the waiter starts as a non-PI waiter and is woken on a PI
futex.

Remove the crusty old comment about holding spinlocks() across get_user() as we
no longer do that. Correct the locking statement with a description of why the
test is performed.

Signed-off-by: Darren Hart <dvhltc@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxx>
CC: Eric Dumazet <eric.dumazet@xxxxxxxxx>
CC: Dinakar Guniguntala <dino@xxxxxxxxxx>
CC: John Stultz <johnstul@xxxxxxxxxx>
---

kernel/futex.c | 15 +++------------
1 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index f92afbe..463af2e 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1656,17 +1656,8 @@ out:
static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
struct hrtimer_sleeper *timeout)
{
- queue_me(q, hb);
-
- /*
- * There might have been scheduling since the queue_me(), as we
- * cannot hold a spinlock across the get_user() in case it
- * faults, and we cannot just set TASK_INTERRUPTIBLE state when
- * queueing ourselves into the futex hash. This code thus has to
- * rely on the futex_wake() code removing us from hash when it
- * wakes us up.
- */
set_current_state(TASK_INTERRUPTIBLE);
+ queue_me(q, hb);

/* Arm the timer */
if (timeout) {
@@ -1676,8 +1667,8 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
}

/*
- * !plist_node_empty() is safe here without any lock.
- * q.lock_ptr != 0 is not safe, because of ordering against wakeup.
+ * If we have been removed from the hash list, then another task
+ * has tried to wake us, and we can skip the call to schedule().
*/
if (likely(!plist_node_empty(&q->list))) {
/*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/