Re: [PATCH 2/2] sched: Rewrite per entity runnable load average tracking

From: Yuyang Du
Date: Fri Jul 11 2014 - 00:11:30 EST


On Thu, Jul 10, 2014 at 10:06:27AM -0700, bsegall@xxxxxxxxxx wrote:

> So, sched_clock(_cpu) can be arbitrarily far off of cfs_rq_clock_task, so you
> can't really do that. Ideally, yes, you would account for any time since
> the last update and account that time as !runnable. However, I don't
> think there is any good way to do that, and the current code doesn't.

Yeah. We only catch up the migrating task to its cfs_rq and substract. No catching
up to "current" time.

> >
> > I made another mistake. Should not only track task entity load, group entity
> > (as an entity) is also needed. Otherwise, task_h_load can't be done correctly...
> > Sorry for the messup. But this won't make much change in the codes.
>
> This will increase it to 2x __update_load_avg per cgroup per
> enqueue/dequeue. What does this (and this patch in general) do to
> context switch cost at cgroup depth 1/2/3?

We can update cfs_rq load_avg, and let the cfs_rq's own se take a ride in that update.
These two should get exactly synchronized anyway (group se's load is only usefull for
task_h_load calc, and group cfs_rq's load is useful for task_h_load and update_cfs_share
calc). And technically, it looks easy:

To update cfs_rq, the update weight is cfs_rq->load.weight
To update its se, the update weight is cfs_rq->tg->se[cpu]->load.weight * on_rq

So the it will not increase to 2x, but 1.05x, maybe, :)

Thanks,
Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/