Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

From: Yong Zhang
Date: Mon Mar 07 2011 - 02:00:30 EST


On Fri, Mar 4, 2011 at 8:11 PM, Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote:
> I based the changes on what I saw during my debugging/test. I
> explained it earlier,
>
> Everyone is dequeued
>
> 1. child runs first, finds parent throttled, so it does not queue
> anything on parent group. child is unthrottled and rt_time now becomes
> 0, parent's rt_nr_running is not incremented.
> 2. Parent timer runs, it is unthrottled, its group->rt_nr_running is 0
> hence enqueue is not called

I have tested with the attached(web mail will mangle it) patch with
yours applied. But I failed to trigger that WARNING.

Below is my steps:
1)mount -t cgroup -ocpu cpu /mnt
2)mkdir /mnt/test-1
3)mkdir /mnt/test-1-1
4)set rt_runtime to 100000 for test-1 and test-1-1
5)run a loop task and attach it to test-1-1

So I thought out a scenario to satisfy your description,
but it's based on the unpatched(without your patch) kernel:
Let's assume a dual-core system with test-1/test-1-1
for rt group, a loop task is running on CPU 1 and test-1
and test-1-1 are both throttled.

CPU-0 CPU-1
do_sched_rt_period_timer(test-1-1)
{
for CPU-1
unthrottled test-1-1.rt_rq[1];
but fail to enqueue it because
we alway get test-1-1.rt_se[0]
due to smp_processor_id();
thus test-1.rt_rq[1].nr_running == 0;
and it returned with run_time == 0;
}
do_sched_rt_period_timer(test-1)
unthrottle test-1.rt_rt[1] but
fail to enqueue test-1.rt_rt[1];
because nr_running == 0;

So if we have your patch for issue-1, when
the hrtimer is running on CPU-1, test-1-1
and test-1 will be queued because that
additional check in run_timer == 0 case.

But once we have your patch for issue-2, the above
problem will be killed by it. right?

Correct me if I'm wrong :)

Thanks,
Yong


--
Only stand for myself
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 01f75a5..7e4839c 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -568,8 +568,12 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
raw_spin_unlock(&rt_rq->rt_runtime_lock);
} else if (rt_rq->rt_nr_running) {
idle = 0;
- if (!rt_rq_throttled(rt_rq))
+ if (!rt_rq_throttled(rt_rq)) {
+ int cpu = cpu_of(rq_of_rt_rq(rt_rq));
+
+ WARN_ON(!on_rt_rq(rt_rq->tg->rt_se[cpu]));
enqueue = 1;
+ }
}

if (enqueue)