Re: change in sched cpu_power causing regressions with SCHED_MC

From: Suresh Siddha
Date: Fri Feb 19 2010 - 14:17:55 EST

Next message: Trond Myklebust: "Re: NFSv4"
Previous message: Roland Dreier: "Re: [PATCH] ib/ehca: fix in_wc handling in process_mad()"
In reply to: Vaidyanathan Srinivasan: "Re: change in sched cpu_power causing regressions with SCHED_MC"
Next in thread: Peter Zijlstra: "Re: change in sched cpu_power causing regressions with SCHED_MC"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 2010-02-19 at 05:03 -0800, Vaidyanathan Srinivasan wrote:
> > - /* Don't want to pull so many tasks that a group would go idle */
> > - max_pull = min(sds->max_load - sds->avg_load,
> > - sds->max_load - sds->busiest_load_per_task);
> > + if (!sds->group_imb) {
> > + /*
> > + * Don't want to pull so many tasks that a group would go idle.
> > + */
> > + load_above_capacity = (sds->busiest_nr_running -
> > + sds->busiest_group_capacity);
> > +
> > + load_above_capacity *= (SCHED_LOAD_SCALE * SCHED_LOAD_SCALE);
> > +
> > + load_above_capacity /= sds->busiest->cpu_power;
> > + }
>
> This seems tricky. max_load - avg_load will be less than
> load_above_capacity most of the time. How does this expression
> increase the max_pull from previous expression?

I am not trying to increase/decrease from the previous expression. Just
trying to do the right thing (to ultimately address smt/mc
power-savings), as the "max_load - busiest_load_per_task" no longer
represents the load above capacity.

>
> > + /*
> > + * We're trying to get all the cpus to the average_load, so we don't
> > + * want to push ourselves above the average load, nor do we wish to
> > + * reduce the max loaded cpu below the average load, as either of these
> > + * actions would just result in more rebalancing later, and ping-pong
> > + * tasks around. Thus we look for the minimum possible imbalance.
> > + * Negative imbalances (*we* are more loaded than anyone else) will
> > + * be counted as no imbalance for these purposes -- we can't fix that
> > + * by pulling tasks to us. Be careful of negative numbers as they'll
> > + * appear as very large values with unsigned longs.
> > + */
> > + max_pull = min(sds->max_load - sds->avg_load, load_above_capacity);
>
> Does this increase or decrease the value of max_pull from previous
> expression?

Does the above help answer your question, Vaidy?

>
> > /* How much load to actually move to equalise the imbalance */
> > *imbalance = min(max_pull * sds->busiest->cpu_power,
> > @@ -4069,19 +4097,6 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
> > sds.busiest_load_per_task =
> > min(sds.busiest_load_per_task, sds.avg_load);
> >
> > - /*
> > - * We're trying to get all the cpus to the average_load, so we don't
> > - * want to push ourselves above the average load, nor do we wish to
> > - * reduce the max loaded cpu below the average load, as either of these
> > - * actions would just result in more rebalancing later, and ping-pong
> > - * tasks around. Thus we look for the minimum possible imbalance.
> > - * Negative imbalances (*we* are more loaded than anyone else) will
> > - * be counted as no imbalance for these purposes -- we can't fix that
> > - * by pulling tasks to us. Be careful of negative numbers as they'll
> > - * appear as very large values with unsigned longs.
> > - */
> > - if (sds.max_load <= sds.busiest_load_per_task)
> > - goto out_balanced;
>
> This is right. This condition was treating most cases as balanced and
> exit right here. However if this check is removed, we will have to
> execute more code to detect/ascertain balanced case.

To add, in update_sd_lb_stats() we are already doing this:

} else if (sgs.avg_load > sds->max_load &&
(sgs.sum_nr_running > sgs.group_capacity ||
sgs.group_imb)) {

So we are already checking sum_nr_running > group_capacity to select the
busiest group. So we are doing the equivalent of this balanced check
much before.

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Trond Myklebust: "Re: NFSv4"
Previous message: Roland Dreier: "Re: [PATCH] ib/ehca: fix in_wc handling in process_mad()"
In reply to: Vaidyanathan Srinivasan: "Re: change in sched cpu_power causing regressions with SCHED_MC"
Next in thread: Peter Zijlstra: "Re: change in sched cpu_power causing regressions with SCHED_MC"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]