Re: [Lse-tech] Minature NUMA scheduler

From: Martin J. Bligh (mbligh@aracnet.com)
Date: Fri Jan 10 2003 - 11:57:40 EST


> Having some sort of automatic node affinity of processes and equal
> node loads in mind (as design targets), we could:
> - take the minimal NUMA scheduler
> - if the normal (node-restricted) find_busiest_queue() fails and
> certain conditions are fulfilled (tried to balance inside own node
> for a while and didn't succeed, own CPU idle, etc... ???) rebalance
> over node boundaries (eg. using my load balancer)
> This actually resembles the original design of the node affine
> scheduler, having the cross-node balancing separate is ok and might
> make the ideas clearer.

This seems like the right approach to me, apart from the trigger to
do the cross-node rebalance. I don't believe that has anything to do
with when we're internally balanced within a node or not, it's
whether the nodes are balanced relative to each other. I think we should
just check that every N ticks, looking at node load averages, and do
a cross-node rebalance if they're "significantly out".

The definintion of "N ticks" and "significantly out" would be a tunable
number, defined by each platform; roughly speaking, the lower the NUMA
ratio, the lower these numbers would be. That also allows us to wedge
all sorts of smarts in the NUMA rebalance part of the scheduler, such
as moving the tasks with the smallest RSS off node. The NUMA rebalancer
is obviously completely missing from the current implementation, and
I expect we'd use mainly Erich's current code to implement that.
However, it's suprising how well we do with no rebalancer at all,
apart from the exec-time initial load balance code.

Another big advantage of this approach is that it *obviously* changes
nothing at all for standard SMP systems (whereas your current patch does),
so it should be much easier to get it accepted into mainline ....

M.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jan 15 2003 - 22:00:33 EST