Re: [RFC v1] Tunable sched_mc_power_savings=n

From: Dipankar Sarma
Date: Mon Jun 30 2008 - 12:14:51 EST


On Fri, Jun 27, 2008 at 10:03:06AM +0200, Andi Kleen wrote:
> Dipankar Sarma wrote:
> > On Thu, Jun 26, 2008 at 11:37:08PM +0200, Andi Kleen wrote:
> >> Dipankar Sarma wrote:
> >>
> > The current usage of this we are looking requires system-wide
> > settings. That means nicing every process running on the system.
> > That seems a little messy.
>
> Is it less messy than the letting applications negotiate
> for the best policy by themselves as someone else suggested on the thread?

I don't think letting applications negotiate among
themselves is a good idea. The kernel should do that.

> > Secondly, even if you nice the processes
> > they are still going to be spread all over the CPU packages
> > running at lower frequencies due to nice.
>
> My point was that this could be fixed and you could use nice
> (or another per process parameter if you prefer)
> as an input to load balancer decisions.

Agreed. A variation of this that allows tasks to indicate
their CPU power requirement, is something that we experimented
with long ago. There are some difficult issues that need to be
sorted out if this is to be effective -

1. For some applications, like xmms, it is easy to predict. For
commercial workloads - like a database, it is hard to get
it right.

2. Conflicting power requirements are hard to resolve. Grouping
of tasks based on various combinations of power requirement
is complex.

3. Setting global policy is expensive - you have to loop through
all the tasks in the system.

> > We are talking about a different optimization here - something
> > that will give more benefits in powersave mode when you have large
> > systems.
>
> Yes it's a different optimization (although the over all theme -- power saving
> -- is the same), but is there a real reason it cannot be driven from the
> same per process heuristics instead of your ugly global sysctl?

See the issues #1 and #2 above. Apart from that, what we discovered
was that server admins really want a global settings at the moment.
Any finer granularity than that would be a waste for them at the
moment. No one really is looking at running php+mysql at one powernice
and tomcat in another level *in the same server*.


> My point was just that the heuristics
> used by one power saving mechanism (ondemand) could be used
> for the other too (socket grouping) -- and it would be certainly
> a far saner interface than a global sysctl!.

Per-task settings was the first thing we looked at when we
started out. I think we should experiment with it and see
if we can come up with a simple implementation that handles
conflicting requirements well. If this can also handle global
system power settings without having to loop through all the
tasks in the system, I am OK with it.


Thanks
Dipankar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/