Re: [PATCH 0/4] sched: remove cpu_load decay

From: Morten Rasmussen
Date: Tue Dec 17 2013 - 13:21:41 EST


On Tue, Dec 17, 2013 at 03:37:23PM +0000, Peter Zijlstra wrote:
> On Tue, Dec 17, 2013 at 02:04:57PM +0000, Morten Rasmussen wrote:
> > On Sat, Dec 14, 2013 at 01:27:59PM +0000, Alex Shi wrote:
> > > On 12/14/2013 04:03 AM, Peter Zijlstra wrote:
> > > >
> > > >
> > > > I had a quick peek at the actual patches.
> > > >
> > > > afaict we're now using weighted_cpuload() aka runnable_load_avg as the
> > > > ->cpu_load. Whatever happened to also using the blocked_avg?
> >
> > AFAICT, ->cpu_load is actually a snapshot value of weigthed_cpuload()
> > that gets updated occasionally. That has been the case since b92486cb.
> > By removing the cpu_load indexes {source,target}_load are now comparing
> > an old snapshot of weighted_cpuload() with the current value. I don't
> > think that really makes sense.
>
> Agreed, worse cpu_load is a very very recent snapshot, so there's not
> been much chance to really diverge much between when we last looked at
> it.
>
> [ for busy load-balance, for newidle there might be since we can run
> between ticks ]
>
> > weighted_cpuload() may change rapidly
> > when tasks are enqueued or dequeued so the old snapshot doesn't have
> > much meaning in my opinion. Maybe I'm missing something?
>
> Right, which is where it makes sense to also account some of the blocked
> load, since that anticipates these arrivals/departures and should smooth
> out the over-all load pictures. Which is something that sounds right for
> balancing.
>
> You don't want to really care too much about the high freq fluctuation,
> but care more about the longer term load.
>
> Or rather -- and this is where the idx thing came from, you want a
> longer term view the bigger your sched_domain is. Since that balances
> nicely against the cost of actually moving tasks around.

That makes sense.

>
> And while runnable_load_avg still includes high freq arrival/departure
> events, the runnable+blocked load should have much less of that.

Agreed, we either need a smooth version of runnable_load_avg or add the
blocked load (given that we fix the priority issue).

There is actually another long-term view of the cpu load in
rq->avg.runnable_avg_sum but I think it might be too conversative. Also
it doesn't track the weight of the tasks on the cpu, just whether the
cpu was idle or not.

>
> > Comparing cpu_load indexes with different decay rates in {source,
> > target}_load() sort of make sense as it makes load-balancing decisions
> > more conservative.
>
> *nod*
>
> > I believe we have discussed using blocked_load_avg in weighted_cpuload()
> > in the past. While it seems to be the right thing to include it, it
> > causes problems related to the priority scaling of the task loads.
> > If you include a blocked load in the weighted_cpuload() and have tiny
> > (very low cpu utilization) task running at very high priority, your
> > weighted_cpuload() will be quite high and force other normal priority
> > tasks away from the cpu and leaving the cpu idle most of the time.
>
> Ah, right. Which is where we should look at balancing utilization as
> well as weight.
>
> Let me ponder this a bit more.

Yes. At least for Android devices this is a big deal.

Would it be too invasive to have an unweighted_cpuload() for balancing
utilization? It would require maintaining an unweighted version of
runnable_load_avg and blocked load.

Maybe you have better ideas.

>
> > >
> > > When enabling the sched_avg in load balance, I didn't find any positive
> > > testing result for several blocked_avg trying, just few regression. :(
> > >
> > > And since this patchset is almost clean up only, no blocked_load_avg
> > > trying again...
> >
> > My worry here is that I don't really understand why the current code
> > works when the decayed cpu_load has been removed.
>
> Not too much different from before I think; but it does loose the longer
> term view on the bigger domains. That in turn makes it slightly more
> agressive, which can be good or bad depending on the workload (good on
> high spawn loads like hackbenchs, bad on more gentle stuff that has
> cache footprint).
>
> > > > I totally hate patch 4; it seems like a random hack to make up for the
> > > > lack of blocked_avg.
> > >
> > > Yes, this bias criteria seems a bit arbitrary. :)
> >
> > This is why I think {source, target}_load() and their use need to be
> > reconsidered.
>
> Aside from that, there's something entirely wrong with 4 in that we
> already have an imbalance between source and target loads, adding
> another basically random imbalance pass on top of that just doesn't make
> any kind of sense what so ff'ing ever.

Agreed.

Morten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/