Re: [PATCH 2/2] sched/deadline: always enqueue on previous rq when dl_task_timer fires

From: Wanpeng Li
Date: Wed Feb 25 2015 - 20:01:38 EST


On Tue, Feb 24, 2015 at 09:28:35AM +0000, Juri Lelli wrote:
>dl_task_timer() may fire on a different rq from where a task was removed
>after throttling. Since the call path is:
>
> dl_task_timer() ->
> enqueue_task_dl() ->
> enqueue_dl_entity() ->
> replenish_dl_entity()
>
>and replenish_dl_entity() uses dl_se's rq, we can't use current's rq
>in dl_task_timer(), but we need to lock the task's previous one.
>
>Signed-off-by: Juri Lelli <juri.lelli@xxxxxxx>

Tested-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxxxxxx>

I see a panic when try to run a dl task and kill the task after several
seconds than retry the process several times, the bug is triggered by
commit 3960c8c0c789 ("sched: Make dl_task_time() use task_rq_lock()"),
Juri's patch fix it.

[ 313.352676] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 313.353483] IP: [<ffffffff8139ee28>] rb_erase+0x118/0x390
[ 313.354060] PGD b5ddb067 PUD b5d96067 PMD 0
[ 313.354501] Oops: 0002 [#1] SMP
[...]
[ 313.356633] Call Trace:
[ 313.356633] [<ffffffff810b2cb7>] dequeue_pushable_dl_task+0x47/0x80
[ 313.356633] [<ffffffff810b46ff>] pick_next_task_dl+0x7f/0x150
[ 313.356633] [<ffffffff8178f7b9>] __schedule+0x839/0x8cb
[ 313.356633] [<ffffffff8178f947>] schedule+0x37/0x90
[ 313.356633] [<ffffffff8178fbae>] schedule_preempt_disabled+0xe/0x10
[ 313.356633] [<ffffffff810b5b18>] cpu_startup_entry+0x168/0x380
[ 313.356633] [<ffffffff810eb2e3>] ? clockevents_register_device+0xe3/0x150
[ 313.356633] [<ffffffff810eba96>] ? clockevents_config_and_register+0x26/0x30
[ 313.356633] [<ffffffff8104a96c>] start_secondary+0x14c/0x170
[ 313.356633] Code: e2 fc 74 ab 48 89 c1 48 89 d0 48 8b 50 08 48 39 ca 74 48 f6 02 01 75 b3 48 8b 4a 10 48 89 c7 48 83 cf 01 48 89 48 08 48 89
42 10 <48> 89 39 48 8b 38 48 89 3a 48 83 e7 fc 48 89 10 0f 84 02 01 00
[ 313.356633] RIP [<ffffffff8139ee28>] rb_erase+0x118/0x390
[ 313.356633] RSP <ffff8800ba3efdc8>
[ 313.356633] CR2: 0000000000000000
[ 313.356633] ---[ end trace 5fbbfdbbc196604d ]---
[ 313.356633] Kernel panic - not syncing: Attempted to kill the idle task!
[ 313.356633] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

>Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>Cc: Kirill Tkhai <ktkhai@xxxxxxxxxxxxx>
>Cc: Juri Lelli <juri.lelli@xxxxxxxxx>
>Cc: linux-kernel@xxxxxxxxxxxxxxx
>Fixes: 3960c8c0c789 ("sched: Make dl_task_time() use task_rq_lock()")
>---
> kernel/sched/deadline.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>index dbf12a9..519e468 100644
>--- a/kernel/sched/deadline.c
>+++ b/kernel/sched/deadline.c
>@@ -538,7 +538,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
> unsigned long flags;
> struct rq *rq;
>
>- rq = task_rq_lock(current, &flags);
>+ rq = task_rq_lock(p, &flags);
>
> /*
> * We need to take care of several possible races here:
>@@ -593,7 +593,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
> push_dl_task(rq);
> #endif
> unlock:
>- task_rq_unlock(rq, current, &flags);
>+ task_rq_unlock(rq, p, &flags);
>
> return HRTIMER_NORESTART;
> }
>--
>2.3.0
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@xxxxxxxxxxxxxxx
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/