Re: [PATCH] Fix longstanding load balancing bug in the scheduler.

From: Christoph Lameter
Date: Thu Sep 07 2006 - 13:21:39 EST


Hmmm... Some more comments

On Thu, 7 Sep 2006, Nick Piggin wrote:

> So what I worry about with this approach is that it can really blow
> out the latency of a balancing operation. Say you have N-1 CPUs with
> lots of stuff locked on their runqueues.

Right but that situation is rare and the performance is certainly
better if the unpinned processes are not all running on the same
processor.

> The solution I envisage is to do a "rotor" approach. For example
> the last attempted CPU could be stored in the starving CPU's sd...
> and it will subsequently try another one.

That wont work since the notion of "pinned" is relative to a cpu.
A process may be pinned to a group of processors. It will only appear to
be pinned for cpus outside that set of processors!

What good does storing a processor number do if the processes can change
dynamically and so may the pinning of processors. We are dealing with
a set of processes. Each of those may be pinned to a set of processors.

> I've been hot and cold on such an implementation for a while: on one
> hand it is a real problem we have; OTOH I was hoping that the domain
> balancing might be better generalised. But I increasingly don't
> think we should let perfect stand in the way of good... ;)

I think we should fix the problem ASAP. Lets do this fix and then we can
think about other solutions. You already had lots of time to think about
the rotor.

This looks to me as a design flaw that would require either a major rework
of the scheduler or (my favorite) a delegation of difficult (and
computational complex and expensive) scheduler decisions to user space.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/