Re: Crash in list_add_leaf_cfs_rq due to bad tmp_alone_branch

From: Vincent Guittot
Date: Fri Jan 18 2019 - 09:06:43 EST


On Fri, 18 Jan 2019 at 11:16, Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
>
> On Wed, 9 Jan 2019 at 23:43, Sargun Dhillon <sargun@xxxxxxxxx> wrote:
> >
> > On Wed, Jan 9, 2019 at 2:14 PM Sargun Dhillon <sargun@xxxxxxxxx> wrote:
> > >
> > > I picked up c40f7d74c741a907cfaeb73a7697081881c497d0 sched/fair: Fix
> > > infinite loop in update_blocked_averages() by reverting a9e7f6544b9c
> > > and put it on top of 4.19.13. In addition to this, I uninlined
> > > list_add_leaf_cfs_rq for debugging.

With the fix above applied, the code that manages the leaf_cfs_rq_list
is the same since v4.9.
Have you noticed similar problem on other older kernel version between
v4.9 and v4.19 ? The problem might have been introduce while modifying
other part of the scheduler like the sequence for adding/removing
cgroup.

Knowing the most recent kernel version without the problem could help
to narrow the problem

Thanks,
Vincent

> > >
> > > This revealed a new bug that we didn't get to because we kept getting
> > > crashes from the previous issue. When we are running with cgroups that
> > > are rapidly changing, with CFS bandwidth control, and in addition
> > > using the cpusets cgroup, we see this crash. Specifically, it seems to
> > > occur with cgroups that are throttled and we change the allowed
> > > cpuset.
>
> Thanks for the context, I will try to reproduce the problem and
> understand how we can stop in the middle of walking to the
> sched_entity branch with a parent not already added
>
> How many cgroup level have you got in you setup ?
>
> > >
> >
> > This patch from Gabriel should fix the problem:
> >
> >
> > [PATCH] sched/fair: Reset tmp_alone_branch on cfs_rq delete
> >
> > When a child cfs_rq is added to the leaf cfs_rq list before its parent
> > tmp_alone_branch is set to point to the child in preparation for the
> > parent being added.
> >
> > If the child is deleted before the parent is added then tmp_alone_branch
> > points to a freed cfs_rq. Any future reference to tmp_alone_branch will
> > result in a use after free.
>
> So, the patch below is a temporary fix that helps to recover from the
> situation where tmp_alone_branch doesn't finished back to
> rq->leaf_cfs_rq_list
> But this situation should not happened at the beginning
>
> >
> > Signed-off-by: Gabriel Hartmann <gabriel.hartmann@xxxxxxxxx>
> > Reported-by: Sargun Dhillon <sargun@xxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 7137bc343b4a..0987629cbb76 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -347,6 +347,11 @@ static inline void list_add_leaf_cfs_rq(struct
> > cfs_rq *cfs_rq)
> > static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
> > {
> > if (cfs_rq->on_list) {
> > + struct rq *rq = rq_of(cfs_rq);
> > +
> > + if (rq->tmp_alone_branch == &cfs_rq->leaf_cfs_rq_list)
> > + rq->tmp_alone_branch = &rq->leaf_cfs_rq_list;
> > +
> > list_del_rcu(&cfs_rq->leaf_cfs_rq_list);
> > cfs_rq->on_list = 0;
> > }