Re: [PATCH 2/2] sched: update runqueue clock before migrations away

From: Chris Redpath
Date: Tue Dec 17 2013 - 09:09:22 EST


On 12/12/13 18:24, Peter Zijlstra wrote:
Would pre_schedule_idle() -> rq_last_tick_reset() -> rq->last_sched_tick
be useful?

I suppose we could easily lift that to NO_HZ_COMMON.


Many thanks for the tip Peter, I have tried this out and it does provide enough information to be able to correct the problem. The new version doesn't update the rq, just carries the extra unaccounted time (estimated from the jiffies) over to be processed during enqueue.

However before I send a new patch set I have a question about the existing behavior. Ben, you may already know the answer to this?

During a wake migration we call __synchronize_entity_decay in migrate_task_rq_fair, which will decay avg.runnable_avg_sum. We also record the amount of periods we decayed for as a negative number in avg.decay_count.

We then enqueue the task on its target runqueue, and again we decay the load by the number of periods it has been off-rq.

if (unlikely(se->avg.decay_count <= 0)) {
se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
if (se->avg.decay_count) {
se->avg.last_runnable_update -= (-se->avg.decay_count)
<< 20;
>>> update_entity_load_avg(se, 0);

Am I misunderstanding how this is supposed to work or have we been always double-accounting sleep time for wake migrations?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/