Re: [PATCH 03/10] sched,fair: redefine runnable_load_avg as the sum of task_h_load

From: Josef Bacik
Date: Mon Jul 01 2019 - 15:22:40 EST


On Mon, Jul 01, 2019 at 12:47:35PM -0400, Rik van Riel wrote:
> On Mon, 2019-07-01 at 12:29 -0400, Josef Bacik wrote:
> > On Fri, Jun 28, 2019 at 04:49:06PM -0400, Rik van Riel wrote:
> > > The runnable_load magic is used to quickly propagate information
> > > about
> > > runnable tasks up the hierarchy of runqueues. The runnable_load_avg
> > > is
> > > mostly used for the load balancing code, which only examines the
> > > value at
> > > the root cfs_rq.
> > >
> > > Redefine the root cfs_rq runnable_load_avg to be the sum of
> > > task_h_loads
> > > of the runnable tasks. This works because the hierarchical
> > > runnable_load of
> > > a task is already equal to the task_se_h_load today. This provides
> > > enough
> > > information to the load balancer.
> > >
> > > The runnable_load_avg of the cgroup cfs_rqs does not appear to be
> > > used for anything, so don't bother calculating those.
> > >
> > > This removes one of the things that the code currently traverses
> > > the
> > > cgroup hierarchy for, and getting rid of it brings us one step
> > > closer
> > > to a flat runqueue for the CPU controller.
> > >
> >
> > My memory on this stuff is very hazy, but IIRC we had the
> > runnable_sum and the
> > runnable_avg separated out because you could have the avg lag behind
> > the sum.
> > So like you enqueue a bunch of new entities who's avg may have
> > decayed a bunch
> > and so their overall load is not felt on the CPU until they start
> > running, and
> > now you've overloaded that CPU. The sum was there to make sure new
> > things
> > coming onto the CPU added actual load to the queue instead of looking
> > like there
> > was no load.
> >
> > Is this going to be a problem now with this new code?
>
> That is a good question!
>
> On the one hand, you may well be right.
>
> On the other hand, while I see the old code calculating
> runnable_sum, I don't really see it _using_ it to drive
> scheduling decisions.
>
> It would be easy to define the CPU cfs_rq->runnable_load_sum
> as being the sum of task_se_h_weight() of each runnable task
> on the CPU (for example), but what would we use it for?
>
> What am I missing?

It's suuuuuper sublte, but you're right in that we don't really need the
runnable_avg per-se, but what you do is you kill calc_group_runnable, which used
to do this

load_avg = max(cfs_rq->avg.load_avg,
scale_load_down(cfs_rq->load.weight));

runnable = max(cfs_rq->avg.runnable_load_avg,
scale_load_down(cfs_rq->runnable_weight));

so we'd account for this weirdness of adding a previously idle process to a new
CPU and overloading the CPU because we'd add a bunch of these 0 weight looking
tasks that suddenly all wake up and are on the same CPU. So we used the
runnable_weight to account for what was actually happening, and the max of
load_avg and the weight to figure out what the potential load would be.

What you've done here is change the weighting stuff to be completely based on
load avg, which is problematic for the reasons above. Did you fix this later on
in your patches? If so then just tell me to keep reading and I'll do that ;).
Thanks,

Josef