Re: Crash in list_add_leaf_cfs_rq due to bad tmp_alone_branch

From: Vincent Guittot
Date: Fri Jan 18 2019 - 05:16:52 EST


On Wed, 9 Jan 2019 at 23:43, Sargun Dhillon <sargun@xxxxxxxxx> wrote:
>
> On Wed, Jan 9, 2019 at 2:14 PM Sargun Dhillon <sargun@xxxxxxxxx> wrote:
> >
> > I picked up c40f7d74c741a907cfaeb73a7697081881c497d0 sched/fair: Fix
> > infinite loop in update_blocked_averages() by reverting a9e7f6544b9c
> > and put it on top of 4.19.13. In addition to this, I uninlined
> > list_add_leaf_cfs_rq for debugging.
> >
> > This revealed a new bug that we didn't get to because we kept getting
> > crashes from the previous issue. When we are running with cgroups that
> > are rapidly changing, with CFS bandwidth control, and in addition
> > using the cpusets cgroup, we see this crash. Specifically, it seems to
> > occur with cgroups that are throttled and we change the allowed
> > cpuset.

Thanks for the context, I will try to reproduce the problem and
understand how we can stop in the middle of walking to the
sched_entity branch with a parent not already added

How many cgroup level have you got in you setup ?

> >
>
> This patch from Gabriel should fix the problem:
>
>
> [PATCH] sched/fair: Reset tmp_alone_branch on cfs_rq delete
>
> When a child cfs_rq is added to the leaf cfs_rq list before its parent
> tmp_alone_branch is set to point to the child in preparation for the
> parent being added.
>
> If the child is deleted before the parent is added then tmp_alone_branch
> points to a freed cfs_rq. Any future reference to tmp_alone_branch will
> result in a use after free.

So, the patch below is a temporary fix that helps to recover from the
situation where tmp_alone_branch doesn't finished back to
rq->leaf_cfs_rq_list
But this situation should not happened at the beginning

>
> Signed-off-by: Gabriel Hartmann <gabriel.hartmann@xxxxxxxxx>
> Reported-by: Sargun Dhillon <sargun@xxxxxxxxx>
> ---
> kernel/sched/fair.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7137bc343b4a..0987629cbb76 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -347,6 +347,11 @@ static inline void list_add_leaf_cfs_rq(struct
> cfs_rq *cfs_rq)
> static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
> {
> if (cfs_rq->on_list) {
> + struct rq *rq = rq_of(cfs_rq);
> +
> + if (rq->tmp_alone_branch == &cfs_rq->leaf_cfs_rq_list)
> + rq->tmp_alone_branch = &rq->leaf_cfs_rq_list;
> +
> list_del_rcu(&cfs_rq->leaf_cfs_rq_list);
> cfs_rq->on_list = 0;
> }