Re: NULL pointer dereference in pick_next_task_fair

From: Peter Zijlstra
Date: Mon Oct 28 2019 - 17:49:26 EST


On Mon, Oct 28, 2019 at 05:46:03PM +0000, Quentin Perret wrote:

> The issue is very transient and relatively hard to reproduce.
>
> After digging a bit, the offending commit seems to be:
>
> 67692435c411 ("sched: Rework pick_next_task() slow-path")
>
> By 'offending' I mean that reverting it makes the issue go away. The
> issue comes from the fact that pick_next_entity() returns a NULL se in
> the 'simple' path of pick_next_task_fair(), which causes obvious
> problems in the subsequent call to set_next_entity().
>
> I'll dig more, but if anybody understands the issue in the meatime feel
> free to send me a patch to try out :)

The only way for pick_next_entity() to return NULL is if the tree is
empty and !cfs_rq->curr. But in that case, cfs_rq->nr_running _should_
be 0 and or it's related se should not be enqueued in the parent cfs_rq.

Now for the root cfs_rq we check nr_running this and jump to the idle
path, however if this occurs in the middle of the hierarchy, we're up a
creek without no paddles. This is something that really should not
happen (because empty cfs_rq should not be enqueued)

Also, if we take the simple patch, as you say, then we'll have done a
put_prev_task(), regardless of how we got there, so we know cfs_rq->curr
must be NULL. Which, with the above, means the tree really is empty.

And as stated above, when the tree is empty and !cfs_rq->curr, the
cfs_rq's se should not be enqueued in the parent cfs_rq so we should not
be getting here.

Clearly something is buggered with the cgroup state. What is your cgroup
setup, are you using cpu-bandwidth?