Re: [PATCH 2/6] drivers base/arch_topology: frequency-invariant load-tracking support

From: Dietmar Eggemann
Date: Wed Jun 21 2017 - 12:38:36 EST


On 20/06/17 07:17, Viresh Kumar wrote:
> On Thu, Jun 8, 2017 at 1:25 PM, Dietmar Eggemann
> <dietmar.eggemann@xxxxxxx> wrote:
>
>> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
>
>> static int
>> init_cpu_capacity_callback(struct notifier_block *nb,
>> @@ -185,6 +192,7 @@ init_cpu_capacity_callback(struct notifier_block *nb,
>> cpus_to_visit,
>> policy->related_cpus);
>> for_each_cpu(cpu, policy->related_cpus) {
>> + per_cpu(max_freq, cpu) = policy->cpuinfo.max_freq;
>
> I am not sure about this but why shouldn't we use policy->max here ?
> As that is the
> max, we can set the frequency to right now.
>

No, frequency invariance is defined by:

current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_supported_freq(cpu)

We don't want to scale against a value which might be restricted e.g.
by thermal capping.

>> if (cap_parsing_failed)
>> continue;
>> raw_capacity[cpu] = topology_get_cpu_scale(NULL, cpu) *

[...]

>> +static int set_freq_scale_callback(struct notifier_block *nb,
>> + unsigned long val,
>> + void *data)
>> +{
>> + struct cpufreq_freqs *freq = data;
>> +
>> + switch (val) {
>> + case CPUFREQ_PRECHANGE:
>> + set_freq_scale(freq->cpu, freq->new);
>
> Any specific reason on why are we doing this from PRECHANGE and
> not POSTCHANGE ? i.e. we are doing this before the frequency is
> really updated.

Not really. In case I get a CPUFREQ_POSTCHANGE all the time the
frequency actually changed I can switch to CPUFREQ_POSTCHANGE.

[...]

>> @@ -225,8 +265,14 @@ static int __init register_cpufreq_notifier(void)
>>
>> cpumask_copy(cpus_to_visit, cpu_possible_mask);
>>
>> - return cpufreq_register_notifier(&init_cpu_capacity_notifier,
>> - CPUFREQ_POLICY_NOTIFIER);
>> + ret = cpufreq_register_notifier(&init_cpu_capacity_notifier,
>> + CPUFREQ_POLICY_NOTIFIER);
>
> Wanted to make sure that we all understand the constraints this is going to add
> for the ARM64 platforms.
>
> With the introduction of this transition notifier, we would not be able to use
> the fast-switch path in the schedutil governor. I am not sure if there are any
> ARM platforms that can actually use the fast-switch path in future or not
> though. The requirement of fast-switch path is that the freq can be changed
> without sleeping in the hot-path.

That's a good point. The cpufreq transition notifier based Frequency
Invariance Engine (FIE) can only work if none of the cpufreq policies
support fast frequency switching.

What about we still enable cpufreq transition notifier based FIE for
systems where this is true. This will cover 100% of all arm/arm64
systems today.

In case one day we have a cpufreq driver which allows fast frequency
switching we would need a FIE based on something else than cpufreq
transition notifier. Maybe based on performance counters (something
similar to x86 APERF/MPERF) or cpufreq core could provide a function
which provides the avg frequency value.

I could make the current implementation more future-proof by only
using the notifier based FIE in case all policies use slow frequency
switching: