Re: [PATCH 2/4] nohz: Synchronize sleep time stats with seqlock

From: Peter Zijlstra
Date: Tue Aug 20 2013 - 13:54:45 EST


On Tue, Aug 20, 2013 at 06:33:12PM +0200, Oleg Nesterov wrote:
> On 08/20, Peter Zijlstra wrote:

> I am wondering how the extra lock(rq)/unlock(rq) in schedule() is bad
> compared to atomic_dec.
>
> IOW, what if we simply make rq->nr_iowait "int" and change schedule()
> to update it? Something like below. Just curious.
>
> As for nr_iowait_local + nr_iowait_remote, this doesn't look safe...
> in theory nr_iowait_cpu() or even nr_iowait() can return a negative
> number.

I'm too tired to see how its different, I'll think on it :-)


That said:

> --- x/kernel/sched/core.c
> +++ x/kernel/sched/core.c
> @@ -2435,6 +2435,9 @@ need_resched:
> rq->curr = next;
> ++*switch_count;
>
> + if (unlikely(prev->in_iowait))
> + rq->nr_iowait++;
> +
> context_switch(rq, prev, next); /* unlocks the rq */
> /*
> * The context switch have flipped the stack from under us
> @@ -2442,6 +2445,12 @@ need_resched:
> * this task called schedule() in the past. prev == current
> * is still correct, but it can be moved to another cpu/rq.
> */
> + if (unlikely(prev->in_iowait)) {
> + raw_spin_lock_irq(&rq->lock);
> + rq->nr_iowait--;
> + raw_spin_unlock_irq(&rq->lock);
> + }

This seems like the wrong place, this is where you return from
schedule() running another task, not where the task you just send to
sleep wakes up.

If you want to to this, the nr_iowait decrement needs to be in the
wakeup path someplace.

> cpu = smp_processor_id();
> rq = cpu_rq(cpu);
> } else
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/