Re: Fwd: [PATCH] sched: Distinguish sched_wakeup event when wake up a task which did schedule out or not.

From: Peter Zijlstra
Date: Sun May 11 2014 - 12:35:45 EST


On Sun, May 11, 2014 at 11:24:22PM +0800, Dongsheng Yang wrote:
> Actually, this patch does not attempt to solve the race condition.
> It only want to avoid sched:sched_wakeup with success==true in
> a fake wakeup, as explained below.
>
> > So the fundamental wait loop is:
> >
> > for (;;) {
> > set_current_state(TASK_UNINTERRUPTIBLE);
> > if (cond)
> > break;
> > schedule();
> > }
> > __set_task_state(TASK_RUNNING);
> >
> > And the fundamental wakeup is:
> >
> > cond = true;
> > wake_up_process(TASK_NORMAL);
> >
> > And this is very much on purpose a lock-free but strictly ordered
> > scenario. It is a variation of:
> >
> > X = Y = 0
> >
> > (wait) (wake)
> > [w] X = 1 [w] Y = 1
> > MB MB
> > [r] Y [r] X
> >
> > [ where: X := state, Y := cond ]
> >
> > And we all 'know' that the only provided guarantee is that:
> > X==0 && Y==0
> > is impossible -- but only that, all 3 other states are observable.
> >
> > This guarantee means that its impossible to both miss the condition and
> > the wakeup; iow. it guarantees fwd progress.
> >
> > OTOH its fundamentally racy, nothing guarantees we will not 'observe' both
> > the condition and the wakeup.
> >
> > The setting of .success=false when ->on_rq is actively wrong, suppose
> > the waiter has already observed cond==false but has not yet gotten to
> > schedule(), at that point the wakeup happens and sees ->on_rq==1. The
> > wakeup is still very much a real wakeup.
>
>
> Yes, if a wakeup happens before schedule(), wakeup
> sees ->on_rq==1. Then we can get an event with .success==false.
> But I think it is not a real wakeup. :(
>
> Yes, at this moment, maybe the task is already out of run queue.
> But *this* wakeup did not move it back to run queue, it only
> change the state of it to TASK_RUNNING. I believe the next
> wakeup for this task will do the real wake up moving it back
> to run queue.
>
> And if scheduler really wake it up, we can get an event with success==true.
>
> Anyway, what I want with this patch is to make scheduler raise accurate
> events when waking up a task.
>
> If a wakeup only change the state of task, raise a event with success==false.
> If a wakeup move a task back to runqueue, .success==true.
>
> It means, we do not need to care about the task is on_rq or not currently,
> the value of .success is decided by the behavior we did in the function
> of try_to_wake_up().
>
> Wish I explain myself clearly.

So if the wait side has already observed cond==false, then without the
wakeup, which still potentially has ->on_rq == true, it would block.
Therefore the wakeup is a _real_ wakeup.

We fundamentally cannot know, on the wake side, if the wait side has or
has not observed cond, and therefore the distinction you're trying to
make is a false one.

Attachment: pgpMs_2N7vsbj.pgp
Description: PGP signature