Re: bug in sched.c:activate_task()

From: Ingo Molnar
Date: Tue Oct 05 2004 - 03:23:40 EST



* Chen, Kenneth W <kenneth.w.chen@xxxxxxxxx> wrote:

> Update p->timestamp to "now" in activate_task() doesn't look right to
> me at all. p->timestamp records last time it was running on a cpu.
> activate_task shouldn't update that variable when it queues a task on
> the runqueue.

it is being used for multiple purposes: measuring on-CPU time, measuring
on-runqueue time (for interactivity) and measuring off-runqueue time
(for interactivity). It is also used to measure cache-hotness, by the
balancing code.

now, this particular update of p->timestamp:

> @@ -888,7 +888,6 @@ static void activate_task(task_t *p, run

> - p->timestamp = now;

is important for the interactivity code that runs in schedule() - it
wants to know how much time we spent on the runqueue without having run.

but you are right that the task_hot() use of p->timestamp is incorrect
for freshly woken up tasks - we want the balancer to be able to re-route
tasks to another CPU, as long as the task has not hit the runqueue yet
(which it hasnt where we consider it in the balancer).

the clean solution is to separate the multiple uses of p->timestamp:
with the patch below p->timestamp is purely used for the interactivity
code, and p->last_ran is for the rebalancer. The patch is ontop of your
task_hot() fix-patch. Does this work for your workload?

Ingo

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>

--- linux/kernel/sched.c.orig
+++ linux/kernel/sched.c
@@ -185,7 +185,8 @@ static unsigned int task_timeslice(task_
else
return SCALE_PRIO(DEF_TIMESLICE, p->static_prio);
}
-#define task_hot(p, now, sd) ((now) - (p)->timestamp < (sd)->cache_hot_time)
+#define task_hot(p, now, sd) ((long long) ((now) - (p)->last_ran) \
+ < (long long) (sd)->cache_hot_time)

/*
* These are the runqueue data structures:
@@ -2833,7 +2834,7 @@ switch_tasks:
if (!(HIGH_CREDIT(prev) || LOW_CREDIT(prev)))
prev->interactive_credit--;
}
- prev->timestamp = now;
+ prev->timestamp = prev->last_ran = now;

sched_info_switch(prev, next);
if (likely(prev != next)) {
--- linux/include/linux/sched.h.orig
+++ linux/include/linux/sched.h
@@ -527,7 +527,7 @@ struct task_struct {

unsigned long sleep_avg;
long interactive_credit;
- unsigned long long timestamp;
+ unsigned long long timestamp, last_ran;
int activated;

unsigned long policy;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/