[patch] CFS scheduler, -v19

From: Ingo Molnar
Date: Fri Jul 06 2007 - 13:34:18 EST



i'm pleased to announce release -v19 of the CFS scheduler patchset.

The rolled-up CFS patch against today's -git kernel, v2.6.22-rc7,
v2.6.22-rc6-mm1, v2.6.21.5 or v2.6.20.14 can be downloaded from the
usual place:

http://people.redhat.com/mingo/cfs-scheduler/

The biggest user-visible change in -v19 is reworked sleeper fairness:
it's similar in behavior to -v18 but works more consistently across nice
levels. Fork-happy workloads (like kernel builds) should behave better
as well. There are also a handful of speedups: unsigned math, 32-bit
speedups, O(1) task pickup, debloating and other micro-optimizations.

Changes since -v18:

- merged the group-scheduling CFS-core changes from Srivatsa Vaddagiri.
This makes up for the bulk of the changes in -v19 but has no
behavioral impact. The final group-fairness enabler patch is now a
small and lean add-on patch to CFS.

- fix the bloat noticed by Andrew. On 32-bit it's now this:

text data bss dec hex filename
24362 3905 24 28291 6e83 sched.o-rc7
33015 2538 20 35573 8af5 sched.o-v18
25805 2426 20 28251 6e5b sched.o-v19

so it's a net win compared to vanilla. On 64-bit it's even better:

text data bss dec hex filename
35732 40314 2168 78214 13186 sched.o.x64-rc7
41397 37642 2168 81207 13d37 sched.o.x64-v18
36132 37410 2168 75710 127be sched.o.x64-v19

( and there's also a +1.5K data win per CPU on x32, which is not
shown here. [+3.0K data win per CPU on x64.] )

- good number of core code updates, cleanups and streamlining.
(Mike Galbraith, Srivatsa Vaddagiri, Dmitry Adamushko, me.)

- use unsigned data types almost everywhere in CFS. This produces
faster and smaller code, and simplifies the logic.

- turn as many 'u64' data types into 'unsigned long' as possible, to
reduce the 32-bit footprint and to reduce 64-bit arithmetics.

- replaced the nr_running based 'sleep fairness' logic with a more
robust concept. The end-result is similar in behavior to v18, but
negative nice levels are handled much better in this scheme.

- speedup: O(1) task pickup by Srivatsa Vaddagiri. [sleep/wakeup is
O(log2(nr_running)).] This gives 5-10% better hackbench 100/500
results on a 4-way box.

- fix: set idle->sched_class back to &idle_sched_class in
migration_call(). (Dmitry Adamushko)

- cleanup: use an enum for the sched_feature flags. (suggested by
Andrew Morton)

- cleanup: turn the priority macros into inlines. (suggested by
Andrew Morton)

- (other cleanups suggested by Andrew Morton)

- debug: split out the debugging data into CONFIG_SCHED_DEBUG.

As usual, any sort of feedback, bugreport, fix and suggestion is more
than welcome!

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/