Re: [RFC PATCH] sched: fix hierarchical order in rq->leaf_cfs_rq_list

From: Dietmar Eggemann
Date: Wed May 25 2016 - 13:40:23 EST


Hi Vincent,

On 24/05/16 10:55, Vincent Guittot wrote:
> Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that
> a child will always called before its parent.
>
> The hierarchical order in shares update list has been introduced by
> commit 67e86250f8ea ("sched: Introduce hierarchal order on shares update list")
>
> With the current implementation a child can be still put after its parent.
>
> Lets take the example of
> root
> \
> b
> /\
> c d*
> |
> e*
>
> with root -> b -> c already enqueued but not d -> e so the leaf_cfs_rq_list
> looks like: head -> c -> b -> root -> tail
>
> The branch d -> e will be added the first time that they are enqueued,
> starting with e then d.
>
> When e is added, its parents is not already on the list so e is put at the
> tail : head -> c -> b -> root -> e -> tail
>
> Then, d is added at the head because its parent is already on the list:
> head -> d -> c -> b -> root -> e -> tail
>
> e is not placed at the right position and will be called the last whereas
> it should be called at the beginning.
>
> Because it follows the bottom-up enqueue sequence, we are sure that we
> will finished to add either a cfs_rq without parent or a cfs_rq with a parent
> that is already on the list. We can use this event to detect when we have
> finished to add a new branch. For the others, whose parents are not already
> added, we have to ensure that they will be added after their children that
> have just been inserted the steps before, and after any potential parents that
> are already in the list. The easiest way is to put the cfs_rq just after the
> last inserted one and to keep track of it untl the branch is fully added.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>

So the use case for this would be you create a multi-level task group
hierarchy on a cpu (e.g. tg_css_id=2,4,6 on cpu=1) and let a task run in
the highest level task group (tg_css_id=6).

list_add_leaf_cfs_rq() call:

...
enqueue_task_fair: cpu=1 tg_css_id=6 cfs_rq=0xffffffc97500f700 on_list=0
enqueue_task_fair: cpu=1 tg_css_id=4 cfs_rq=0xffffffc975175900 on_list=0
enqueue_task_fair: cpu=1 tg_css_id=2 cfs_rq=0xffffffc975866d00 on_list=0
enqueue_task_fair: cpu=1 tg_css_id=1 cfs_rq=0xffffffc97fec44e8 on_list=1

...

In this case, the for_each_leaf_cfs_rq() in update_blocked_averages()
iterates in the tg_css_id=2,1,6,4 order:

...
update_blocked_averages: tg_css_id=2 cfs_rq=0xffffffc975866d00 on_list=1
update_blocked_averages: tg_css_id=1 cfs_rq=0xffffffc97fec44e8 on_list=1
update_blocked_averages: tg_css_id=6 cfs_rq=0xffffffc97500f700 on_list=1
update_blocked_averages: tg_css_id=4 cfs_rq=0xffffffc975175900 on_list=1
...

which I guess breaks the update_tg_cfs_util() call you introduced in
update_blocked_averages() in '[RFC PATCH v2] sched: reflect sched_entity
movement into task_group's utilization'. IMHO, otherwise
update_blocked_averages() can deal with the list not being ordered
tg_css_id=6,4,2,1.

https://lkml.org/lkml/2016/5/24/200

The use of for_each_leaf_cfs_rq() in update_shares() is gone. Do the
remaining call sites (update_runtime_enabled(),
unthrottle_offline_cfs_rqs() require any ordering?

[...]