Re: scheduler scalability - cgroups, cpusets and load-balancing

From: Paul Jackson
Date: Tue Jan 29 2008 - 06:53:52 EST


Peter wrote;
> So, I don't think we need that, I think we can do with the single flag,
> we just need to find these disjoint sets and stick our rt-domain there.

Ah - perhaps you don't need that flag - but my other cpuset users do ;).

You see, there are two very different ways that 'sched_load_balance' is
used in practice.

The other way is by big batch schedulers. They may be placed in charge
of managing a few hundred CPUs on a system, and might be running a mix
of many small jobs each covering only a few CPUs. They routinely setup
one cpuset for each job, to contain that job to the CPUs and memory
nodes assigned to it. This is actually the original motivating use for
cpusets.

As a bit of optimization, batch schedulers desire to tell the normal
kernel scheduler -not- to bother load balancing across the big set of
CPUs controlled by the batch scheduler, but only to load balance within
each of the smaller per-job cpusets. Load balancing across hundreds
of CPUs when the batch scheduler knows such efforts would be fruitless
is a waste of good CPU cycles in kernel/sched.c.

I really doubt we'd want to have such systems triggering the hard RT
scheduler on whatever CPUs were in the batch schedulers big cpuset
that didn't happened to have an active job currently assigned to them.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/