Re: Futex queue_me/get_user ordering
From: Hidetoshi Seto
Date: Tue Nov 16 2004 - 03:30:03 EST
OMG... Wait, wait... Don't do anything.
I have to deeply apologize to all for my mistake.
If my understanding is correct, this bug is "2.4 futex"(RHEL3) *SPECIFIC*!!
I had swallow the story that 2.6 futex has the same problem...
So I realize that 2.6 futex never behave:
>> "returns 0 if the futex was not equal to the expected value, but
>> the process was woken by a FUTEX_WAKE call."
Update of manpage is now unnecessary, I think.
#
First of all, I would appreciate if you could read my old post:
"Kernel bug in futex_wait, cause application hang with NPTL"
http://www.ussg.iu.edu/hypermail/linux/kernel/0409.0/2044.html
#
Then, let's go on to the main subject.
Jamie Lokier wrote:
> In fact, waiting does not get the lock for the futex. It relies on
> the ordering of (1) adding to the wait queue, (2) checking the current
> value, and (3) removing from the wait queue if the value doesn't
> match. Among other things, this is necessary because checking the
> current value cannot be done with a spinlock held.
If my understanding is correct, 2.6 futex does not get any spinlocks,
but a semaphore:
[kernel/futex.c](from 2.6, RHEL4b2)
286 static int futex_wake(unsigned long uaddr, int nr_wake)
287 {
:
294 down_read(¤t->mm->mmap_sem);
:
306 wake_futex(this);
:
314 up_read(¤t->mm->mmap_sem);
315 return ret;
316 }
:
477 static int futex_wait(unsigned long uaddr, int val, unsigned long time)
478 {
:
483 down_read(¤t->mm->mmap_sem);
:
489 queue_me(&q, -1, NULL);
:
500 if (curval != val) {
501 ret = -EWOULDBLOCK;
502 goto out_unqueue;
503 }
:
509 up_read(¤t->mm->mmap_sem);
:
528 time = schedule_timeout(time);
:
536 /* If we were woken (and unqueued), we succeeded, whatever. */
537 if (!unqueue_me(&q))
538 return 0;
539 if (time == 0)
540 return -ETIMEDOUT;
541 /* A spurious wakeup should never happen. */
542 WARN_ON(!signal_pending(current));
543 return -EINTR;
544
545 out_unqueue:
546 /* If we were woken (and unqueued), we succeeded, whatever. */
547 if (!unqueue_me(&q))
548 ret = 0;
549 out_release_sem:
550 up_read(¤t->mm->mmap_sem);
551 return ret;
552 }
This semaphore prevents a waiter which temporarily queued to check the val
from being target of wakeup.
So my "[simulation]" is wrong if it is on 2.6, since wake_Y never be able to
touch the queue while wait_A is in the queue to have the val to be checked.
(If it is not possible that there are threads which go around with same
futex/condvar but each have different mmap_sem,) 2.6 futex is quite good.
#
Next, let's see how about 2.4 futex:
[kernel/futex.c](from 2.4, RHEL3U2)
154 static inline int futex_wake(unsigned long uaddr, int offset, int num)
155 {
:
160 lock_futex_mm();
:
176 wake_up_all(&this->waiters);
:
185 unlock_futex_mm();
:
188 return ret;
189 }
:
310 static inline int futex_wait(unsigned long uaddr,
311 int offset,
312 int val,
313 unsigned long time)
314 {
:
323 lock_futex_mm();
:
330 __queue_me(&q, page, uaddr, offset, -1, NULL);
:
342 if (curval != val) {
343 unlock_futex_mm();
344 ret = -EWOULDBLOCK;
345 goto out;
346 }
:
357 unlock_futex_mm();
358 time = schedule_timeout(time);
:
365 if (time == 0) {
366 ret = -ETIMEDOUT;
367 goto out;
368 }
369 if (signal_pending(current))
370 ret = -EINTR;
371 out:
372 /* Were we woken up anyway? */
373 if (!unqueue_me(&q))
374 ret = 0;
375 put_page(q.page);
376
377 return ret;
:
383 }
2.4 futex uses spinlocks.
74 static inline void lock_futex_mm(void)
75 {
76 spin_lock(¤t->mm->page_table_lock);
77 spin_lock(&vcache_lock);
78 spin_lock(&futex_lock);
79 }
80
81 static inline void unlock_futex_mm(void)
82 {
83 spin_unlock(&futex_lock);
84 spin_unlock(&vcache_lock);
85 spin_unlock(¤t->mm->page_table_lock);
86 }
However, this spinlocks fail to prevent topical waiters from wakeups.
Because the spinlocks are released *before* unqueue_me(&q) (line 343 & 373).
So this failure allows wake_Y to touch the queue while wait_A is in it.
Of course as you know, this brings bug which I have mentioned.
(I don't know how many distributions have 2.4 futex in itself, but)
At least 2.4 futex in RHEL3U2 is buggy.
#
I regret that I could not notice this fact earlier.
I'm sorry... I hope you'll accept my apology.
Thanks,
H.Seto
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/