[PATCH 0/7] CPU controller - V1

From: Srivatsa Vaddagiri
Date: Sun Aug 20 2006 - 13:38:18 EST

This patch is an attempt to provide some minimal CPU controller for resource
management purpose. Although we have seen several proposals for one before,
none has gained acceptance yet. For ex: http://lkml.org/lkml/2006/8/4/137
summarizes the feedback received against the last CPU controller posted by
Fujitsu folks.

This is V1 of the patch posted at http://lkml.org/lkml/2006/8/4/6
and has been tested against 2.6.18-rc3.

Change since last version:

- Basic task-group aware SMP load balancing, based on SMP nice

Still on my todo list:

- Cleanup so that !CONFIG_CPUMETER has zero impact
- Better timeslice management, so that bursty workloads
can be handled well. I have some ideas (for ex: don't
switch active/expired arrays as soon as active
array becomes empty) which I would like to experiment
with a bit before sending a patch for this
- Better SMP load balancing. I feel that some feedback
about how much bandwidth a task-group is actually
getting (compared to its quota) can be included in
runqueue pressure calculations.

Salient design points of this patch:

- Each task-group gets its own runqueue on every cpu.

- In addition, there is an active and expired array of
task-groups themselves. Task-groups who have expired their
quota are put into expired array.

- Task-groups have priorities. Priority of a task-group is the
same as the priority of the highest-priority runnable task it
has. This I feel will retain interactiveness of the system
as it is today.

- Scheduling the next task involves picking highest priority task-group
from active array first and then picking highest-priority task
within it. Both steps are O(1).

- Token are assigned to task-groups based on their assigned
quota. Once they run out of tokens, the task-group is put
in an expired array. Array switch happens when active array
is empty.

- SMP load-balancing is accomplished on the lines of smpnice.

- Although the algorithm is very simple, it perhaps needs more
refinement to handle different cases. Especially I
feel task-groups which are idle most of the time and
experience bursts once in a while will need to be handled
better than this simple scheme.

Would request your inputs as before.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/