scheduler scalability - cgroups, cpusets and load-balancing

From: Peter Zijlstra
Date: Tue Jan 29 2008 - 04:55:18 EST


Hi All,

Some of the fancy new scheduler features such as the cgroup load
balancer (load_balance_monitor) and the real-time load balancer are a
bit of an scalability issue. They all seem to want a rather strong
global bound to keep a global fairness (which is quite understandable).

[ my own interest is currently real-time group scheduling on multiple
cpus, and that seems to require _very_ strong bonds ]

I think the current stuff would scale up to 8 maybe 16 cpus, but after
that I'd be real worried.

Now we want distributions to enable most of these features. Distros seem
to want containers, but distros also need to support 128+ cpu machines,
so how are we going to solve this.

My thoughts were to make stronger use of disjoint cpu-sets. cgroups and
cpusets are related, in that cpusets provide a property to a cgroup.
However, load_balance_monitor()'s interaction with sched domains
confuses me - it might DTRT, but I can't tell.

[ It looks to me it balances a group over the largest SD the current cpu
has access to, even though that might be larger than the SD associated
with the cpuset of that particular cgroup. ]

Also the RT load-balance needs to become aware of such these sets, I
think Paul J and Steven once talked about it, but can't quite remember
where that ended. From my POV there should be sched-domain based balance
information, not global.

By cutting the problem into smaller pieces, and adding tunables to
weaken to global fairness, I think we can give administrators enough
freedom to make use of these features, even on the largest of machines.

[ so I'd move the load_balance_monitor() tunables into cpusets as well,
I can imagine a smaller cpuset wanting a stronger fairness than a much
larger cpuset. ]

I understand its a somewhat hand-wavey email, but I wanted to start
discussion on the issue, or have someone show me I'm wrong and can stop
worrying :-).

Peter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/