Re: [PATCH V2] sched: Improve load balancing in the presence of idle CPUs

From: Morten Rasmussen
Date: Wed Apr 01 2015 - 09:03:21 EST


Hi Preeti and Jason,

On Wed, Apr 01, 2015 at 07:28:03AM +0100, Preeti U Murthy wrote:
> On 03/31/2015 11:00 PM, Jason Low wrote:
> > On Tue, 2015-03-31 at 14:28 +0530, Preeti U Murthy wrote:
> >
> >> Morten,
> >
> >> I am a bit confused about the problem you are pointing to.
> >
> >> I am unable to see the issue. What is it that I am missing ?
> >
> > Hi Preeti,
> >
> > Here is one of the potential issues that have been described from my
> > understanding.
> >
> > In situations where there are just a few tasks to pull (for example,
> > there's 1 task to move).
> >
> > Before, if CPU 1 calls run_rebalance_domains(), we'll pull the tasks to
> > this CPU 1 (which is already awake) and run the task on CPU 1.
> >
> > Now, we'll pull the task to some idle CPU 2 and wake up CPU 2 in order
> > for the task to run. Meanwhile, CPU 1 may go idle, instead of running
> > the task on CPU 1 which was already awake.

Yes. This is the scenario I had in mind although I might have failed to
make it crystal clear in my earlier replies.

> Alright I see. But it is one additional wake up. And the wake up will be
> within the cluster. We will not wake up any CPU in the neighboring
> cluster unless there are tasks to be pulled. So, we can wake up a core
> out of a deep idle state and never a cluster in the problem described.
> In terms of energy efficiency, this is not so bad a scenario, is it?

After Peter pointed out that it shouldn't happen across clusters due to
group_classify()/sg_capacity_factor() it isn't as bad as I initially
thought. It is still not an ideal solution I think. Wake-ups aren't nice
for battery-powered devices. Waking up a cpu in an already active
cluster may still imply powering up the core and bringing the L1 cache
into a usable state, but it isn't as bad as waking up a cluster. I would
prefer to avoid it if we can.

Thinking more about it, don't we also risk doing a lot of iterations in
nohz_idle_balance() leading to nothing (pure overhead) in certain corner
cases? If find_new_ild() is the last cpu in the cluster and we have one
task for each cpu in the cluster but one cpu is currently having two.
Don't we end up trying all nohz-idle cpus before giving up and balancing
the balancer cpu itself. On big machines, going through everyone could
take a while I think. No?

Morten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/