Re: Futex queue_me/get_user ordering

From: Jakub Jelinek
Date: Thu Mar 17 2005 - 10:56:49 EST


On Thu, Mar 17, 2005 at 03:20:31PM +0000, Jamie Lokier wrote:
> If you change futex_wait to be "atomic", and then have userspace locks
> which _depend_ on that atomicity, it becomes impossible to wait on
> multiple of those locks, or make poll-driven state machines which can
> wait on those locks.

The futex man pages that have been around for years (certainly since mid 2002)
certainly don't document FUTEX_WAIT as token passing operation, but as atomic
operation:

Say http://www.icewalkers.com/Linux/ManPages/futex-2.html

FUTEX_WAIT
This operation atomically verifies that the futex
address still contains the value given, and sleeps
awaiting FUTEX_WAKE on this futex address. If the
timeout argument is non-NULL, its contents describe
the maximum duration of the wait, which is infinite
otherwise. For futex(4), this call is executed if
decrementing the count gave a negative value (indi
cating contention), and will sleep until another
process releases the futex and executes the
FUTEX_WAKE operation.

RETURN VALUE
FUTEX_WAIT
Returns 0 if the process was woken by a FUTEX_WAKE
call. In case of timeout, ETIMEDOUT is returned. If
the futex was not equal to the expected value, the
operation returns EWOULDBLOCK. Signals (or other
spurious wakeups) cause FUTEX_WAIT to return EINTR.

so there very well might be programs other than glibc that
depend on this behaviour. Given that in most cases the race
is not hit every day (after all, we have been living with it for
several years), they probably wouldn't know there is a problem
like that.

> You can do userspace threading and simulate most blocking system calls
> by making them non-blocking and using poll).

Sure, but then you need to write your own locking as well and
can just use the token passing property of futexes there.

> It's not a _huge_ loss, but considering it's only Glibc which is
> demanding this and futexes have another property, token-passing, which
> Glibc could be using instead - why not use it?

Because that requires requeue being done with the cv lock held, which
means an extra context switch.

> > @@ -265,7 +264,6 @@ static inline int get_futex_value_locked
> > inc_preempt_count();
> > ret = __copy_from_user_inatomic(dest, from, sizeof(int));
> > dec_preempt_count();
> > - preempt_check_resched();
> >
> > return ret ? -EFAULT : 0;
> > }
>
> inc_preempt_count() and dec_preempt_count() aren't needed, as
> preemption is disabled by the queue spinlocks. So
> get_futex_value_locked isn't needed any more: with the spinlocks held,
> __get_user will do.

They aren't needed if CONFIG_PREEMPT. But with !CONFIG_PREEMPT, they
are IMHO still needed, as spin_lock/spin_unlock call preempt_{disable,enable},
which is a nop if !CONFIG_PREEMPT.
__get_user can't be used though, it should be __get_user_inatomic
(or __copy_from_user_inatomic if the former doesn't exist).

> > [numerous instances of...]
> > + preempt_check_resched();
>
> Not required. The spin unlocks will do this.

True, preempt_check_resched() is a nop if !CONFIG_PREEMPT and for
CONFIG_PREEMPT spin_unlock will handle it. Will remove them from the
patch.

> > But with the recent changes to futex.c I think kernel can ensure
> > atomicity for free.
>
> I agree it would probably not slow the kernel, but I would _strongly_
> prefer that Glibc were fixed to use the token-passing property, if
> Glibc is the driving intention behind this patch - instead of this
> becoming a semantic that application-level users of futex (like
> database and IPC libraries) come to depend on and which can't be
> decomposed into a multiple-waiting form.
>
> (I admit that the kernel code does look nicer with
> get_futex_value_locked gone, though).
>
> By the way, do you know of Scott Snyder's recent work on fixing Glibc
> in this way? He bumped into one of Glibc's currently broken corner
> cases, fixed it (according to the algorithm I gave in November), and
> reported that it works fine with the fix.

I certainly haven't seen his patch.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/