Re: [RFC PATCH v2 0/2] Saving power by cpu evacuationsched_max_capacity_pct=n

From: Vaidyanathan Srinivasan
Date: Thu May 14 2009 - 11:14:14 EST


* Andi Kleen <andi@xxxxxxxxxxxxxx> [2009-05-13 17:01:00]:

> > >From what I've been told its popular to over-commit the cooling capacity
> > in a rack, so that a number of servers can run at full thermal capacity
> > but not all.
>
> Yes. But in this case you don't want to use throttling, you want
> to use p-states which actually safe power unlike throttling.

One of the design points for the discussion is to bring in C-States
into the equation. As you have mentioned today we can effectively use
P-States to reduce core frequency and thereby reduce average power
and heat. With the introduction of very low power deep sleep states
in the processor, C-States can provide substantial power savings apart
from just P-State based methods. Forcefully idling cores will lead to
exploitation of C-States and their power savings benefits.

As mentioned earlier, cpu throttling as it exist today should not
be used in normal operating conditions. However exploiting P-States
and C-States as two control variables, the system can be made to
operate at various power (thermal) and performance points.

>
> > I've also been told that hardware sucks at throttling,
>
> Throttling is not really something you should use in normal
> operation, it's just a emergency measure. For that it works
> quite well, but you really don't want it in normal operation.
>
> > therefore people
> > want to fix the OS so as to limit the thermal capacity and avoid the
> > hardware throttle from kicking in, whilst still not exceeding the rack
> > capacity or similar nonsense.
>
> Yes that's fine and common, but you actually need to save power for this,
> which throttling doesn't do.

Reducing work, scheduling them smartly in the OS can greatly save
power as compared to throttling in hardware in order to reduce power
or heat.

> My understanding this work is a extension of the existing
> sched_mc_power_savings features that tries to be optionally more
> aggressive to keep complete package idle so that package level
> power saving kicks in.

Scheduling work smartly (power efficiently) is part of the
sched_mc_power_savings framework, while this RFC/discussion is around
reducing work or forcing idle times but at a granularity of
cores/packages to provide maximum power/thermal benefits.

--Vaidy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/