Re: [PATCH RFC] sched/fair: let cpu's cfs_rq to reflect task migration

From: Dietmar Eggemann
Date: Thu Apr 07 2016 - 16:30:24 EST


Hi Vincent,

On 04/07/2016 02:04 PM, Vincent Guittot wrote:
Hi Dietmar,

On 6 April 2016 at 20:53, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
On 06/04/16 09:37, Morten Rasmussen wrote:
On Tue, Apr 05, 2016 at 06:00:40PM +0100, Dietmar Eggemann wrote:

[...]

@@ -2910,8 +2920,13 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
if (!entity_is_task(se))
return;

- rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
- rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
+ if (&rq_of(cfs_rq)->cfs == cfs_rq) {
+ rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
+ rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
+ } else {
+ rq_of(cfs_rq)->cfs.added_util_avg = se->avg.util_avg;
+ rq_of(cfs_rq)->cfs.added_util_sum = se->avg.util_sum;
+ }
}

Don't you also need similar thing for the detach ?

Maybe? I ran workloads in tg's and checked last_update_time of cfs_rq
and &rq_of(cfs_rq)->cfs and they always were in sync. That's obviously
only the call-stack 'task_move_group_fair() -> detach_task_cfs_rq() ->
detach_entity_load_avg()' and not the one starting from
switched_from_fair().

[...]

But attach_entity_load_avg() is not only called in enqueue_entity_load_avg() for migrated
tasks but also in attach_task_cfs_rq() which is called from switched_to_fair() and
task_move_group_fair() where we can't assume that after the enqueue_entity_load_avg() a
call to update_cfs_rq_load_avg() follows like in

enqueue_task_fair():

for_each_sched_entity(se)
enqueue_entity()
enqueue_entity_load_avg()
update_cfs_rq_load_avg(now, cfs_rq)
if (migrated) attach_entity_load_avg()

for_each_sched_entity(se)
update_load_avg()
update_cfs_rq_load_avg(now, cfs_rq)


Not sure if we can just update the root cfs_rq to se->avg.last_update_time before we add
se->avg.util_[avg/sum] to rq_of(cfs_rq)->cfs.avg.util_[avg/sum] in attach_entity_load_avg()?

cfs_rq throttling has to be considered as well ...

IIUC this new proposal, the utilization of a task will be accounted on
the utilization of the root cfs_rq thanks to
tsk->se->cfs_rq->tg->se[cpu]->... down to the root cfs_rq. Then, you
directly add the utilization of the newly enqueued task in the root
cfs_rq.

Not sure if you're referring to this, but in __update_load_avg() I
suppress the utilization update for se's w/ !entity_is_task(se) and
cfs_rq's w/ &rq_of(cfs_rq)->cfs != cfs_rq so preventing the first case.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.