Re: [patch 1/2] fix perf. bug in wake-up load balancing for aim7and db workload

From: Peter Williams
Date: Wed Feb 15 2006 - 22:55:20 EST


Nick Piggin wrote:
Chen, Kenneth W wrote:

Revert commit d7102e95b7b9c00277562c29aad421d2d521c5f6,
which causes more than 10% performance regression with aim7.


Just to be sure, what kernel did you test with? In particular,
did it have the smpnice patch reverted (as -rc3 does).


Analysis of the smpnice code indicates that it could cause anomalous cpu selection decisions in try_to_wake_up() if there is a skew in the distribution of nice among the tasks on the cpus under consideration. Attached for review is a proposed patch to address problem. In particular, I request comments on the following issues:

1. Is this potential problem worth worrying about?
2. Do you agree with my decision to replace SCHED_LOAD_SCALE with the average load per task for this_cpu in the if statement in try_to_wake_up() or should I be using the average load per task for the tasks current cpu in one or both places?

Signed-off-by: Peter Williams <pwil3058@xxxxxxxxxxxxxx>

Peter
--
Peter Williams pwil3058@xxxxxxxxxxxxxx

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c 2006-02-16 12:39:30.000000000 +1100
+++ MM-2.6.X/kernel/sched.c 2006-02-16 14:36:24.000000000 +1100
@@ -1061,6 +1061,18 @@ static inline unsigned long target_load(
}

/*
+ * Return the average load per task on the cpu's run queue
+ */
+static inline unsigned long cpu_avg_load_per_task(int cpu)
+{
+ runqueue_t *rq = cpu_rq(cpu);
+ unsigned long n = rq->nr_running;
+ unsigned long load = weighted_load(rq->prio_bias);
+
+ return n ? load / n : load;
+}
+
+/*
* find_idlest_group finds and returns the least busy CPU group within the
* domain.
*/
@@ -1309,6 +1321,7 @@ static int try_to_wake_up(task_t *p, uns

if (this_sd->flags & SD_WAKE_AFFINE) {
unsigned long tl = this_load;
+ unsigned long tl_per_task = cpu_avg_load_per_task(this_cpu);
/*
* If sync wakeup then subtract the (maximum possible)
* effect of the currently running task from the load
@@ -1318,8 +1331,8 @@ static int try_to_wake_up(task_t *p, uns
tl -= weighted_load(p->bias_prio);

if ((tl <= load &&
- tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) ||
- 100*(tl + SCHED_LOAD_SCALE) <= imbalance*load) {
+ tl + target_load(cpu, idx) <= tl_per_task) ||
+ 100*(tl + tl_per_task) <= imbalance*load) {
/*
* This domain has SD_WAKE_AFFINE and
* p is cache cold in this domain, and