Re: [PATCH 5/6] sched, fair: Make group power more consitent

From: Peter Zijlstra
Date: Mon Aug 19 2013 - 06:30:51 EST


On Mon, Aug 19, 2013 at 09:47:47AM +0530, Preeti U Murthy wrote:
> Hi Peter,
>
> On 08/16/2013 03:42 PM, Peter Zijlstra wrote:
>
> I have a few comments and clarification to seek.
>
> 1. How are you ensuring from this patch that sgs->group_power does not
> change over the course of load balancing?

Well, we only set it the one time when creating the sgs data.

> The only path to update_group_power() where sg->sgp->power gets
> updated, is from update_sg_lb_stats(). You are updating sgs->group_power
> in update_sg_lb_stats(). Any change to group->sgp->power will get
> reflected in sgs->group_power as well right?

Nope, we set it to whatever value group->sgp->power is at the time of
sgs 'creation' and live with that value from then on. We do this after
the possible update_group_power() call.

That said, it is very rare to have group->sgp->power change during the
load-balance pass, we would have to trigger the time_after case for
NEWLY_IDLE and then get a concurrent !NEWLY_IDLE load-balance pass.

This patch takes away that possibility and uses a consistent group power
reading for the entire load-balance invocation as well as does away with
that double dereference all the time.

> 2. This point is aside from your patch. In the current implementation,
> each time the cpu power gets updated in update_cpu_power(), should not
> the power of the sched_groups comprising of that cpu also get updated?
> Why wait till the load balancing is done at the sched_domain level of
> that group, to update its group power?

What would be the advantage of doing so? We also take snapshots of
cpu/group/domain load, we don't update those either. Having all that
dynamically update during the load-balance pass would make it an
impossible situation.

You'd fail to meet progress guarantees that way since you'd never be
able to pin-point a 'busiest' group/cpu because by the time you've made
any decision you have to go back to make it again because things might
have changed again.

So instead what we do is we take a snapshot and live with that state. If
the values change so fast that our load-balance pass is invalid by the
time we're finished, too bad, better luck next time.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/