Re: [RFC PATCH 06/16] arm: topology: Define TC2 sched energy and provide it to scheduler

From: Morten Rasmussen
Date: Mon Jun 09 2014 - 05:00:03 EST


On Sun, Jun 08, 2014 at 12:26:29AM +0100, Yuyang Du wrote:
> On Fri, Jun 06, 2014 at 12:50:36PM +0200, Peter Zijlstra wrote:
> > > Voltage is combined with frequency, roughly, voltage is proportional
> > > to freuquecy, so roughly, power is proportionaly to voltage^3. You
> >
> > P ~ V^2, last time I checked.
> >
> > > can't say which is more important, or there is no reason to raise
> > > voltage without raising frequency.
> >
> > Well, some chips have far fewer voltage steps than freq steps; or,
> > differently put, they have multiple freq steps for a single voltage
> > level.
> >
> > And since the power (Watts) is proportional to Voltage squared, its the
> > biggest term.
> >
> > If you have a distinct voltage level for each freq, it all doesn't
> > matter.
> >
>
> Ok. I think we understand each other. But one more thing, I said P ~ V^3,
> because P ~ V^2*f and f ~ V, so P ~ V^3. Maybe some frequencies share the same
> voltage, but you can still safely assume V changes with f in general, and it
> will be more and more so, since we do need finer control over power consumption.

Agreed. Voltage typically changes with frequency.

>
> > Sure, but realize that we must fully understand this governor and
> > integrate it in the scheduler if we're to attain the goal of IPC/watt
> > optimized scheduling behaviour.
> >
>
> Attain the goal of IPC/watt optimized?
>
> I don't see how it can be done like this. As I said, what is unknown for
> prediction is perf scaling *and* changing workload. So the challenge for pstate
> control is in both. But I see more chanllenge in the changing workload than
> in the performance scaling or the resulting IPC impact (if workload is
> fixed).

IMHO, the per-entity load-tracking does a fair job representing the task
compute capacity requirements. Sure it isn't perfect, particularly not
for memory bound tasks, but it is way better than not having any task
history at all, which was the case before.

The story is more or less the same for performance scaling. It is not
taken into account at all in the scheduler at the moment. cpufreq is
actually messing up load-balancing decisions after task load-tracking
was introduced. Adding performance scaling awareness should only make
things better even if predictions are not accurate for all workloads. I
don't see why it shouldn't given the current state of energy-awareness
in the scheduler.

> Currently, all freq governors take CPU utilization (load%) as the indicator
> (target), which can server both: workload and perf scaling.

With a bunch of hacks on top to make it more reactive because the
current cpu utilization metric is not responsive enough to deal with
workload changes. That is at least the case for ondemand and interactive
(in Android).

> As for IPC/watt optimized, I don't see how it can be practical. Too micro to
> be used for the general well-being?

That is why I propose to have a platform specific energy model. You tell
the scheduler enough about your platform that it understands the most
basic power/performance trade-offs of your platform and thereby enable
the scheduler to make better decisions.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/