Re: scheduler scalability - cgroups, cpusets and load-balancing

From: Peter Zijlstra
Date: Tue Jan 29 2008 - 07:22:01 EST



On Tue, 2008-01-29 at 05:53 -0600, Paul Jackson wrote:
> Peter wrote;
> > So, I don't think we need that, I think we can do with the single flag,
> > we just need to find these disjoint sets and stick our rt-domain there.
>
> Ah - perhaps you don't need that flag - but my other cpuset users do ;).
>
> You see, there are two very different ways that 'sched_load_balance' is
> used in practice.
>
> The other way is by big batch schedulers. They may be placed in charge
> of managing a few hundred CPUs on a system, and might be running a mix
> of many small jobs each covering only a few CPUs. They routinely setup
> one cpuset for each job, to contain that job to the CPUs and memory
> nodes assigned to it. This is actually the original motivating use for
> cpusets.
>
> As a bit of optimization, batch schedulers desire to tell the normal
> kernel scheduler -not- to bother load balancing across the big set of
> CPUs controlled by the batch scheduler, but only to load balance within
> each of the smaller per-job cpusets. Load balancing across hundreds
> of CPUs when the batch scheduler knows such efforts would be fruitless
> is a waste of good CPU cycles in kernel/sched.c.
>
> I really doubt we'd want to have such systems triggering the hard RT
> scheduler on whatever CPUs were in the batch schedulers big cpuset
> that didn't happened to have an active job currently assigned to them.

My turn to be confused..

If SD_LOAD_BALANCE is only set on the smaller, per-job, sets, how will
the RT balancer trigger on the large set?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/