Re: [PATCH] Documentation: Remove misleading examples of the barriers in wake_*()

From: Boqun Feng
Date: Fri Sep 11 2015 - 13:00:27 EST


Hi Oleg,

On Thu, Sep 10, 2015 at 07:55:57PM +0200, Oleg Nesterov wrote:
> On 09/10, Boqun Feng wrote:
> >
> > On Wed, Sep 09, 2015 at 12:28:22PM -0700, Paul E. McKenney wrote:
> > > My feeling is
> > > that we should avoid saying too much about the internals of wait_event()
> > > and wake_up().
>
> I feel the same. I simply can't understand what we are trying to
> document ;)
>

What I think we should document here is what memory ordering guarantee
we can rely on with these sleep/wakeup primitives, or what kind of
barriers these primitives imply. Because the structure of the
memory-barriers.txt here is:

(*) Implicit kernel memory barriers.

- Locking functions.
- Interrupt disabling functions.
->- Sleep and wake-up functions.<-
- Miscellaneous functions.

> For example,
>
> > A STORE-LOAD barrier is implied after setting task state by wait-related functions:
> >
> > prepare_to_wait();
> > prepare_to_wait_exclusive();
> > prepare_to_wait_event();
>
> I won't argue, but to me this looks misleading too.
>
> Yes, prepare_to_wait()->set_current_state() implies mb() and thus
> a STORE-LOAD barrier.
>
> But this has nothing to do with the explanation above. We do not
> need this barrier to avoid the race with wake_up(). Again, again,
> we can safely rely on wq->lock and acquire/release semantics.
>

Yes, you are right. prepare_to_wait*() should be put here. What should
be put here is set_current_state(), whose STORE-LOAD barrier pairs with
the STORE-LOAD barrier of wake_up_process().

> This barrier is only needed if you do, say,
>
> CONDITION = 1;
>
> if (waitqueue_active(wq))
> wake_up(wq);
>
> And note that the code above is wrong without another mb() after
> CONDITION = 1.
>

Understood, I admit I didn't realize this before.

To summarize, we have three kinds of data related to sleep/wakeup:

* CONDITIONs: global data used to indicate events

* task states

* wait queues(may not be used, if users use set_current_state() +
schedule() to sleep and wake_up_process() to wake up)

IIUC, the race on wait queues are almost avoided because of wq->locks,
and if a wait queue is used, race on task states are avoided because
states are readed and written with a wq->lock held in sleep/wakeup
functions. So only in two cases we need STORE-LOAD barriers to avoid the
race:

1. no wait queue used(e.g. rcu_boost_kthread), we need STORE-LOAD
to order accesses to task states and CONDITIONs, so we have
barriers in wake_up_process() and set_current_state().

2. wait queue accessed without a wq->lock held(e.g. your example),
we need STORE-LOAD to order accesses to wait queues and CONDITIONs

Since case #1 still exists in kernel, we'd better keep this section in
memory-barriers.txt, however, I'm not sure whether we should mention
case #2 in this section.

Here is a modified version, without mentioning case #2:


SLEEP AND WAKE-UP FUNCTIONS
---------------------------

Sleeping and waking on an event flagged in global data can be viewed as an
interaction between two pieces of data: the task state of the task waiting for
the event and the global data used to indicate the event. To make sure that
these appear to happen in the right order, the primitives to begin the process
of going to sleep, and the primitives to initiate a wake up imply certain
barriers.

If a wait queue is used, all accesses to task states are protected by the lock
of the wait queue, so the race on task states are avoided. However, if no wait
queue used, we need some memory ordering guantanee to avoid the race between
sleepers/wakees and wakers.

The memory ordering requirement here can be expressed by two STORE-LOAD
barriers(barriers which can guarantee a STORE perceding it can never be
reordered after a LOAD following it). One STORE-LOAD barrier is needed on the
sleeper/wakee side, before reading a variable used to indicate the event and
after setting the state of the current task. Another STORE-LOAD barrier is
needed on the waker side, before reading the state of the wakee task and after
setting a variable used to indicate the event. These two barriers can pair with
each other to avoid race conditions between sleepers/wakees and wakers:

sleepr/wakee on CPU 1 waker on CPU 2
======================== ========================
{ wakee->state = TASK_RUNNING, event_indicated = 0 }
STORE current->state=TASK_INTERRUPTIBLE
<STORE-LOAD barrier>
c = LOAD event_indicated
STORE event_indicated=1
<STORE-LOAD barrier>
s = LOAD wakee->state

assert(!(c==0 && s == TASK_RUNNING));

A STORE-LOAD barrier is implied after setting task state in set_current_state()
and before reading task state in wake_up_process()

Make sure call set_current_state() before read the global data used to indicate
event and sleep, and call wake_up_process() after set the global data used to
indicate a event.


Regards,
Boqun

Attachment: signature.asc
Description: PGP signature