Re: [PATCH RT 2/3] swait: Add memory barrier before checking listempty

From: Steven Rostedt
Date: Mon Aug 19 2013 - 12:49:17 EST


On Mon, 19 Aug 2013 11:51:55 -0400
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> On Mon, 19 Aug 2013 11:35:32 -0400
> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> > From: Steven Rostedt <rostedt@xxxxxxxxxxx>
> >
> > There's a race condition with swait wakeups and adding to the list. The
> > __swait_wake() does a check for swait_head_has_waiters(), and if it is
> > empty it will exit without doing any wake ups. The problem is that the
> > check does not include any memory barriers before it makes a decision
> > to wake up or not.
> >
> > CPU0 CPU1
> > ---- ----
> >
> > condition = 1
> >
> > load h->list (is empty)
> > raw_spin_lock(hlist->lock)
> > hlist_add();
> > __set_current_state();
> > raw_spin_unlock(hlist->lock)
> > swait_wake()
> > swait_head_has_waiters()
> > (sees h->list as empty and returns)
> >
>
> BTW, the race still exists if you move the raw_spin_unlock(hlist->lock)
> above to here. That is:
>
> swait_wake()
> swait_head_has_waiters()
> (sees h->list as empty and returns)
>
> raw_spin_unlock(hlist->lock)
>
>
> Maybe this will help to understand it more.

As with all memory barrier issues, it can be confusing. To try to
describe this even better, the bug happens because the load of h->list
can happen before the store of condition by the waker.

The wake up code requires that the condition is written out first, and
then the check to see if waiters exist happens (which is the check if
h->list is empty or not).

To prevent missing a wake up, the waiter will add itself to the wait
queue (h->list), then check the condition. If the condition is already
set, it puts back its state, and continues without calling schedule. If
the condition is not set, its safe to call schedule because if the task
that is to wake it up will do so after setting the condition. But
that's assuming that it knows to wake it up (because its on the list).

If the condition is stored after the check of whether or not there are
waiters, then we can not guarantee we will wake up the task waiting for
the condition to arise. What happens instead, the waker checks the list,
sees nothing on it, and returns (the condition is still in the process
of being stored by the CPU, but hasn't yet made it to memory). The
waiter then sets itself on the list to be woken, and checks the
condition. But since the condition hasn't been written out yet, it goes
to sleep. But this time, nobody is there to wake it up.

By adding the memory barrier before checking the list to see if anyone
is waiting for the condition, we force the condition out to memory and
so it will be seen by the waiter before it goes to sleep, or after it
added itself to the waiter list. In any case, the memory barrier will
either have the waiter see the condition is set, or the waker will see
the waiter is on the list. Either case, the waiter will be woken up.

Now, patch 3/3 handles the case where the waiter may read the condition
before it sets itself on the list. This too is bad, because it can read
the condition before the waker sets it, but then the waker can read the
list before the waiter adds itself to the list. This is also a missed
wakeup.

But luckily, this case on x86 isn't a issue because the
raw_spin_unlock() will prevent that from happening, because on x86,
raw_spin_unlock() is a full memory barrier. But it may not be on other
architectures. If other architectures only flush out what has been
written, it does not guarantee that something could have been read
early, that would let the condition leak before the h->list is stored.
That's why I kept patch 3/3 different than this patch. This patch is a
bug on all SMP architectures, where as 3/3 is only a bug on specific
architectures.

-- Steve


>
> > check_condition (sees condition = 0)
> >
> > store condition = 1
> >
> > schedule()
> >
> > Now the task on CPU1 has just missed its wakeup. By adding a memory
> > barrier before the list empty check, we fix the problem of miss seeing
> > the list not empty as well as pushing out the condition for the other
> > task to see.
> >
> > Reviewed-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/