May I ask a question about scheduler (sched_rt.c)?

From: MING ZHOU
Date: Mon Oct 31 2011 - 23:46:51 EST


Hi all,

May I ask a question about scheduler (sched_rt.c)? I want to make
sure a patch related in kernel-linux mailing list is valid or not.
http://lkml.org/lkml/2008/9/25/189

I encountered a kernel panic recently which caused by BUG_ON in
pick_next_pushable_task ( my kernel version is 2.6.35, on arm
platform).

static struct task_struct *pick_next_pushable_task(struct rq *rq)
{
...
BUG_ON(task_current(rq, p)); <------------ panic here!!!
...
}

<4>[17583.180664] [<c00a3888>] (pick_next_pushable_task+0x4c/0xa4)
from [<c00ae21c>] (push_rt_task+0x20/0x264)
<4>[17583.180725] [<c00ae21c>] (push_rt_task+0x20/0x264) from
[<c00ae554>] (post_schedule_rt+0x14/0x20)
<4>[17583.180816] [<c00ae554>] (post_schedule_rt+0x14/0x20) from
[<c069352c>] (schedule+0x738/0x7c8)


I checked patch history related to push_rt_task, and I think the
following patch may be the reason, since if dequeue task improperly,
it may ruin task pointer by mistake.

https://lkml.org/lkml/2011/8/14/71
Commit-ID: 311e800e16f63d909136a64ed17ca353a160be59
Author: Hillf Danton <dhillf@xxxxxxxxx>

sched, rt: Fix rq->rt.pushable_tasks bug in push_rt_task()

Do not call dequeue_pushable_task() when failing to push an eligible
task, as it remains pushable, merely not at this particular moment.

Signed-off-by: Hillf Danton <dhillf@xxxxxxxxx>
Signed-off-by: Mike Galbraith <mgalbraith@xxxxxx>
Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Cc: Yong Zhang <yong.zhang0@xxxxxxxxx>

And I also noticed in commit history of sched_rt.c, I found the
similar patch is submitted before at 2008.
However, it was not picked up in latest kernel code. So, I am
wondering whether this patch is valid?

commit 1563513d34ed4b12ef32bc2adde4a53ce05701a1
Author: Gregory Haskins <ghaskins@xxxxxxxxxx>
Date: Mon Dec 29 09:39:53 2008 -0500

RT: fix push_rt_task() to handle dequeue_pushable properly

A panic was discovered by Chirag Jog where a BUG_ON sanity check
in the new "pushable_task" logic would trigger a panic under
certain circumstances:

http://lkml.org/lkml/2008/9/25/189

Gilles Carry discovered that the root cause was attributed to the
pushable_tasks list getting corrupted in the push_rt_task logic.
This was the result of a dropped rq lock in double_lock_balance
allowing a task in the process of being pushed to potentially migrate
away, and thus corrupt the pushable_tasks() list.

I traced back the problem as introduced by the pushable_tasks patch
that went in recently. There is a "retry" path in push_rt_task()
that actually had a compound conditional to decide whether to
retry or exit. I missed the meaning behind the rationale for the
virtual "if(!task) goto out;" portion of the compound statement and
thus did not handle it properly. The new pushable_tasks logic
actually creates three distinct conditions:

1) an untouched and unpushable task should be dequeued
2) a migrated task where more pushable tasks remain should be retried
3) a migrated task where no more pushable tasks exist should exit

The original logic mushed (1) and (3) together, resulting in the
system dequeuing a migrated task (against an unlocked foreign run-queue
nonetheless).

To fix this, we get rid of the notion of "paranoid" and we support the
three unique conditions properly. The paranoid feature is no longer
relevant with the new pushable logic (since pushable naturally limits
the loop) anyway, so lets just remove it.

Reported-By: Chirag Jog <chirag@xxxxxxxxxxxxxxxxxx>
Found-by: Gilles Carry <gilles.carry@xxxxxxxx>
Signed-off-by: Gregory Haskins <ghaskins@xxxxxxxxxx>


Best Regards,
Jane Zhou
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/