Re: [git pull] scheduler fixes

From: Mike Galbraith
Date: Wed Jan 21 2009 - 07:40:44 EST


On Sun, 2009-01-18 at 19:54 +0100, Mike Galbraith wrote:
> On Sun, 2009-01-18 at 16:28 +0100, Peter Zijlstra wrote:
> >
> > If however your workload consists of cpu hogs, each will run for the
> > full wakeup preemption 'slice' you now see these buddy pairs do.
>
> Hm. I had a whack buddy tags if you are one at tick in there, but
> removed it pending measurement. I was wondering if a last buddy hog
> could end up getting the CPU back after having received his quanta and
> being resched, but haven't checked that yet.

Dunno if this really needs fixing, but it does happen, and frequently.

Buddies can be selected over waiting tasks despite having just received their
full slice and more. Fix this by clearing the buddy tag in put_prev_entity()
or check_preempt_tick() if they've received their fair share.

Clear buddy status once a task has received it's fair share.

Signed-off-by: Mike Galbraith <efault@xxxxxx>

---
kernel/sched_fair.c | 33 ++++++++++++++++++++++++++++++++-
1 file changed, 32 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -768,8 +768,10 @@ check_preempt_tick(struct cfs_rq *cfs_rq

ideal_runtime = sched_slice(cfs_rq, curr);
delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
- if (delta_exec > ideal_runtime)
+ if (delta_exec >= ideal_runtime) {
+ clear_buddies(cfs_rq, curr);
resched_task(rq_of(cfs_rq)->curr);
+ }
}

static void
@@ -818,6 +820,33 @@ static struct sched_entity *pick_next_en
return se;
}

+static void cond_clear_buddy(struct cfs_rq *cfs_rq, struct sched_entity *prev)
+{
+ s64 delta_exec = prev->sum_exec_runtime;
+ u64 min = sysctl_sched_min_granularity;
+
+ /*
+ * We need to clear buddy status if the previous task has received it's
+ * fair share, but we don't want to increase overhead significantly for
+ * fast/light tasks by calling sched_slice() too frequently.
+ */
+ if (unlikely(prev->load.weight != NICE_0_LOAD)) {
+ struct load_weight load;
+
+ load.weight = prio_to_weight[NICE_TO_PRIO(0) - MAX_RT_PRIO];
+ load.inv_weight = prio_to_wmult[NICE_TO_PRIO(0) - MAX_RT_PRIO];
+ min = calc_delta_mine(min, prev->load.weight, &load);
+ }
+
+ delta_exec -= prev->prev_sum_exec_runtime;
+
+ if (delta_exec > min) {
+ delta_exec -= sched_slice(cfs_rq, prev);
+ if (delta_exec >= 0)
+ clear_buddies(cfs_rq, prev);
+ }
+}
+
static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev)
{
/*
@@ -829,6 +858,8 @@ static void put_prev_entity(struct cfs_r

check_spread(cfs_rq, prev);
if (prev->on_rq) {
+ if (prev == cfs_rq->next || prev == cfs_rq->last)
+ cond_clear_buddy(cfs_rq, prev);
update_stats_wait_start(cfs_rq, prev);
/* Put 'current' back into the tree. */
__enqueue_entity(cfs_rq, prev);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/