Re: Scheduling the highest priority task

From: Ingo Molnar
Date: Thu Aug 02 2007 - 07:40:32 EST



* Martin Roehricht <ml@xxxxxxxxxxx> wrote:

> perhaps someone can give me a hint what I should consider to look for in
> order to change the ("old" 2.6.21) scheduler such that it schedules the
> highest priority task of a given runqueue.
> Given a multiprocessor system I currently observe that whenever there
> are two tasks on one CPU, the lower priority one is migrated to another
> CPU. But I don't realize why this happens. From looking at the source
> code I thought it should be the highest priority one (lowest bit set in
> the runqueue's bitmap) according to
> idx = sched_find_first_bit(array->bitmap);
> within move_tasks(). The idx value is then used as an index (surprise)
> to the linked list of tasks of this particular priority and one task is
> picked:
> head = array->queue + idx;
> curr = head->prev;
> tmp = list_entry(curr, struct task_struct, run_list);
>
> Can anybody confirm that my observations are correct that the
> scheduler picks the lowest priority job of a runqueue for migration?
> What needs to be changed in order to pick the highest priority one?

in the SMP migration code, the 'old scheduler' indeed picks the lowest
priority one, _except_ if that task is running on another CPU or is too
'cache hot':

if (skip_for_load ||
!can_migrate_task(tmp, busiest, this_cpu, sd, idle, &pinned)) {

also, from the priority-queue at 'idx', we pick head->prev, i.e. we
process the list in the opposite order as schedule(). (This got changed
in CFS to process in the same direction - which is more logical and also
yield the most cache-cold tasks for migration.)

i hope this helps.

> Is my assumption wrong? Using printk()s within this code section makes
> the system just hang completely quite soon. The schedstats do not
> notify me immediately. So I am a bit lost on how to track down or
> trace the problem.

yep, printk locks up. You can use my static tracer:

http://people.redhat.com/mingo/latency-tracing-patches/

add explicit static tracepoints to the scheduler code you want to
instrument via trace_special(x,y,z) calls [with parameters that interest
you most], and you can read out the trace via:

http://people.redhat.com/mingo/latency-tracing-patches/trace-it.c

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/