[PATCH] 2.3.41 scheduler change

From: Rik van Riel (riel@nl.linux.org)
Date: Sat Jan 29 2000 - 11:38:47 EST


Hi,

since the biggest penalty with scheduling is the repopulation
of the cache with (user) pages from the new process, I've
written a scheduler patch for 2.3.41 that tries to optimise
for that.

The patch doesn't use p->counter for priority purposes, this
means that we should incurr less cache misses in the following
(quite common?) situation.

- you have a number of cpus, N (N = 1, 2, whatever)
- you have N+1 processes using a fair amount of CPU
- you have vi, pine, mutt, whatever using very little CPU

When you press a key in vi and the currently running runnable
process gets descheduled, it will have a lower priority than
the other process (because it's used part of its timeslice)
and immediately after vi the _other_ running process gets
scheduled. This could result in unneeded cache thrashing.

My patch calculates a slower-moving priority so that short-term
use of CPU (within one timeslice) doesn't influence priority.
This means that the _same_ running process will be rescheduled
after vi goes back to sleep, this should reduce cache thrashing
and increase efficiency.

I haven't done a lot of performance tests with this scheduler
yet, but xmms seems to use about 20% less CPU with this patch
than without ... however, this is on a dual pentium with shared
L2 cache. I suspect that a dual celeron or Xeon will see the
biggest benefits of this patch.

I'm interested in any comments on and measurements on this patch.
p->counter and p->priority have been changed into shorts because
the short->int conversion most probably is much cheaper than an
extra cache miss ... any numbers on this are very much welcome too.

regards,

Rik

--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

--- linux-2.3.41/fs/proc/array.c.orig Sat Jan 29 03:55:49 2000 +++ linux-2.3.41/fs/proc/array.c Sat Jan 29 14:47:58 2000 @@ -317,7 +317,7 @@ /* scale priority and nice values from timeslices to -20..20 */ /* to make it look like a "normal" Unix priority/nice value */ - priority = task->counter; + priority = (task->dynpriority >> (SHIFT_PRIORITY - 1)); priority = 20 - (priority * 10 + DEF_PRIORITY / 2) / DEF_PRIORITY; nice = task->priority; nice = 20 - (nice * 20 + DEF_PRIORITY / 2) / DEF_PRIORITY; --- linux-2.3.41/kernel/sched.c.orig Sat Jan 29 02:52:17 2000 +++ linux-2.3.41/kernel/sched.c Sat Jan 29 15:42:46 2000 @@ -110,7 +110,7 @@ static inline int goodness(struct task_struct * p, int this_cpu, struct mm_struct *this_mm) { - int weight; + int weight = 0; /* * Realtime process, select the first one on the @@ -123,16 +123,17 @@ } /* - * Give the process a first-approximation goodness value - * according to the number of clock-ticks it has left. - * - * Don't do any other calculations if the time slice is - * over.. + * Don't do any calculations if the time slice is over.. */ - weight = p->counter; - if (!weight) + if (!p->counter) goto out; + /* + * Our basic priority, we can get some bonusses later. + * All of this is done to be cache friendly. + */ + weight = p->dynpriority >> SHIFT_PRIORITY; + #ifdef __SMP__ /* Give a largish advantage to the same processor... */ /* (this is equivalent to penalizing other processors) */ @@ -141,9 +142,8 @@ #endif /* .. and a slight advantage to the current MM */ - if (p->mm == this_mm) + if (p->active_mm == this_mm) weight += 1; - weight += p->priority; out: return weight; @@ -173,7 +173,7 @@ */ static inline int preemption_goodness(struct task_struct * prev, struct task_struct * p, int cpu) { - return goodness(p, cpu, prev->mm) - goodness(prev, cpu, prev->mm); + return goodness(p, cpu, prev->active_mm) - goodness(prev, cpu, prev->active_mm); } /* @@ -609,7 +609,11 @@ spin_unlock_irq(&runqueue_lock); read_lock(&tasklist_lock); for_each_task(p) - p->counter = (p->counter >> 1) + p->priority; + /* we won't dirty the cacheline when we don't have to */ + if (p->dynpriority != (p->priority << SHIFT_PRIORITY) || p->counter != p->priority) { + p->dynpriority = p->dynpriority - (p->dynpriority >> SHIFT_PRIORITY) + p->counter; + p->counter = p->priority; + } read_unlock(&tasklist_lock); spin_lock_irq(&runqueue_lock); } --- linux-2.3.41/kernel/fork.c.orig Sat Jan 29 13:04:43 2000 +++ linux-2.3.41/kernel/fork.c Sat Jan 29 15:19:08 2000 @@ -714,6 +714,8 @@ */ current->counter >>= 1; p->counter = current->counter; + current->dynpriority -= (current->dynpriority >> SHIFT_PRIORITY); + p->dynpriority = current->dynpriority; /* * Ok, add it to the run-queues and make it --- linux-2.3.41/include/linux/sched.h.orig Sat Jan 29 02:46:03 2000 +++ linux-2.3.41/include/linux/sched.h Sat Jan 29 04:46:01 2000 @@ -273,8 +273,9 @@ cycles_t avg_slice; int lock_depth; /* Lock depth. We can context switch in and out of holding a syscall kernel lock... */ /* begin intel cache line */ - long counter; - long priority; + short counter; + short priority; + long dynpriority; unsigned long policy; /* memory management info */ struct mm_struct *mm, *active_mm; @@ -385,6 +386,7 @@ #define _STK_LIM (8*1024*1024) #define DEF_PRIORITY (20*HZ/100) /* 200 ms time slices */ +#define SHIFT_PRIORITY 6 /* priority calculation falloff */ /* * INIT_TASK is used to set up the first task table, touch at @@ -393,7 +395,7 @@ #define INIT_TASK(name) \ /* state etc */ { 0,0,0,KERNEL_DS,&default_exec_domain,0, \ /* avg_slice */ 0, -1, \ -/* counter */ DEF_PRIORITY,DEF_PRIORITY,SCHED_OTHER, \ +/* counter */ DEF_PRIORITY,DEF_PRIORITY,(DEF_PRIORITY << SHIFT_PRIORITY),SCHED_OTHER, \ /* mm */ NULL, &init_mm, \ /* has_cpu */ 0,0, \ /* run_list */ LIST_HEAD_INIT(init_task.run_list), \ --- linux-2.3.41/arch/i386/kernel/process.c.orig Sat Jan 29 13:31:30 2000 +++ linux-2.3.41/arch/i386/kernel/process.c Sat Jan 29 13:33:08 2000 @@ -90,6 +90,7 @@ init_idle(); current->priority = 0; current->counter = -100; + current->dynpriority = 0; while (1) { void (*idle)(void) = acpi_idle;

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:24 EST