Re: [RFC patch 4/5] futex: Enqueue waiter before user space check

From: Darren Hart
Date: Mon Nov 25 2013 - 19:21:16 EST


On Mon, 2013-11-25 at 20:58 +0000, Thomas Gleixner wrote:


This changes the queue ordering on the waiter side from
>
> lock(hash_bucket);
> validate user space value();
> queue();
> unlock(hash_bucket);
>
> to
> lock(hash_bucket);
> queue();
> validate user space value();
> unlock(hash_bucket);
>
> This is a preparatory patch for a lockless empty check of the hash
> bucket plist.
>
> No functional implication. All futex operations are still serialized
> via the hasb bucket lock.
>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>

Reviewed-by: Darren Hart <dvhart@xxxxxxxxxxxxxxx>

...

> + *
> + * The futex_lock_pi ordering is similar to that, but it has the
queue
> + * operation right before unlocking hash bucket lock and scheduling.
>

There is also futex_wait_requeue_pi(), as you mentioned earlier.

I spent some time looking into this as I thought I had something in
place to prevent futex_wake from waking pending requeue_pi futex_qs.

futex_wait_requeue_pi() sets up a q.rt_waiter pointer which
distinguishes requeue_pi waiters from other waiters.

futex_wake() will abort if it detects the rt_waiter, returning EINVAL as
it is an invalid op pairing.

In this case, the spin_lock is adequate for futex_wait_requeue_pi()
because the wake wouldn't be performed regardless, so while the race
still exists, it isn't relevant. Worst case, the futex_wake returns 0,
indicating the list was empty and it didn't wake anything, rather than
-EINVAL, indicating the invalid op pairing. This is a delta from today's
behavior, but it is in a rare error path, and the result is no more
damaging than the existing behavior.


The approach appears sound to me. Now I'll get to looking at testing.

--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/