Re: [RFC PATCH v6 6/9] thermal: cpu_cooling: implement the power cooling device API

From: Javi Merino
Date: Thu Jan 29 2015 - 14:06:27 EST


On Thu, Jan 29, 2015 at 12:15:07AM +0000, Eduardo Valentin wrote:
> > Hi Eduardo,
> >
> > Eduardo Valentin <edubezval@xxxxxxxxx> writes:
> >
> > > Hello Javi,
> > >
> > > On Fri, Dec 05, 2014 at 07:04:17PM +0000, Javi Merino wrote:
> > >> Add a basic power model to the cpu cooling device to implement the
> > >> power cooling device API. The power model uses the current frequency,
> > >> current load and OPPs for the power calculations. The cpus must have
> > >> registered their OPPs using the OPP library.
> > >>
> > >> Cc: Zhang Rui <rui.zhang@xxxxxxxxx>
> > >> Cc: Eduardo Valentin <edubezval@xxxxxxxxx>
> > >> Signed-off-by: Punit Agrawal <punit.agrawal@xxxxxxx>
> > >> Signed-off-by: Javi Merino <javi.merino@xxxxxxx>
> > >
> > > <big cut>
> > >
> > >> +
> > >> +/**
> > >> + * get_load() - get load for a cpu since last updated
> > >> + * @cpufreq_device: &struct cpufreq_cooling_device for this cpu
> > >> + * @cpu: cpu number
> > >> + *
> > >> + * Return: The average load of cpu @cpu in percentage since this
> > >> + * function was last called.
> > >> + */
> > >> +static u32 get_load(struct cpufreq_cooling_device *cpufreq_device, int cpu)
> > >> +{
> > >> + u32 load;
> > >> + u64 now, now_idle, delta_time, delta_idle;
> > >> +
> > >> + now_idle = get_cpu_idle_time(cpu, &now, 0);
> > >> + delta_idle = now_idle - cpufreq_device->time_in_idle[cpu];
> > >> + delta_time = now - cpufreq_device->time_in_idle_timestamp[cpu];
> > >> +
> > >> + if (delta_time <= delta_idle)
> > >> + load = 0;
> > >> + else
> > >> + load = div64_u64(100 * (delta_time - delta_idle), delta_time);
> > >> +
> > >> + cpufreq_device->time_in_idle[cpu] = now_idle;
> > >> + cpufreq_device->time_in_idle_timestamp[cpu] = now;
> > >> +
> > >> + return load;
> > >> +}
> > >
> > > <cut>
> > >
> > >>
> > >> +/**
> > >> + * cpufreq_get_actual_power() - get the current power
> > >> + * @cdev: &thermal_cooling_device pointer
> > >> + *
> > >> + * Return the current power consumption of the cpus in milliwatts.
> > >> + */
> > >> +static u32 cpufreq_get_actual_power(struct thermal_cooling_device *cdev)
> > >> +{
> > >> + unsigned long freq;
> > >> + int cpu;
> > >> + u32 static_power, dynamic_power, total_load = 0;
> > >> + struct cpufreq_cooling_device *cpufreq_device = cdev->devdata;
> > >> +
> > >> + freq = cpufreq_quick_get(cpumask_any(&cpufreq_device->allowed_cpus));
> > >> +
> > >> + for_each_cpu(cpu, &cpufreq_device->allowed_cpus) {
> > >> + u32 load;
> > >> +
> > >> + if (cpu_online(cpu))
> > >> + load = get_load(cpufreq_device, cpu);
> > >> + else
> > >> + load = 0;
> > >> +
> > >> + total_load += load;
> > >> + }
> > >> +
> > >> + cpufreq_device->last_load = total_load;
> > >> +
> > >> + static_power = get_static_power(cpufreq_device, freq);
> > >> + dynamic_power = get_dynamic_power(cpufreq_device, freq);
> > >> +
> > >> + return static_power + dynamic_power;
> > >> +}
> > >
> > > With respect to load computation vs. frequency usage vs. power
> > > estimation, while getting actual power for a given interval T. What if
> > > in interval T, we have used, say, 3 different cpu frequencies, and the
> > > load on the first was 50%, on the second 80%, and on the last frequency,
> > > the load was 60%, what should be the right load value for computing the
> > > actual power?
> > >
> > > I mean, we are using the total idle time for a given interval, but 1 -
> > > idle not always seams to reflect actual load on different opps, if opps
> > > change over time within T time interval window.
> >
> > The value returned by cpufreq_get_actual_power is used as a proxy for
> > the estimate of the requested power of the actor for the next window
> > duration. Even though the frequency might have changed in the previous
> > period, the current frequency reflects the latest information about the
> > required performance. As it is an estimate, and to avoid making the
> > power calculations more complicated, we used utilisation (1 - idle time)
> > to calculate the request. The estimate for the T+1 period becomes more
> > accurate as the load stabilises.
> >
> > In our testing on different workloads using 100ms as the polling period
> > for thermal control, we didn't see any problems arising from the use of
> > this definition of utilisation.
> >
> > Having said that, there are a number of ways to improve the accuracy of
> > the power calculations. As part of investigating the effects of
> > improving model accuracy and it's effect on thermal control and
> > performance, we plan to look at fine-grained frequency and load tracking
> > once the initial set of patches are merged.
>
> In this case, I believe we must mark the code at least with a TODO or
> REVISIT mark. Can we add the above comments within a REVISIT: mark in
> this part of the code?

Ok, we will add a comment that summarizes this discussion around this
area of code, acknowledging the simplification and hinting that we
will look into improving it.

Cheers,
Javi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/