Re: [PATCH v5 2/2] sched/topology: change behaviour of sysctl sched_energy_aware based on the platform

From: Shrikanth Hegde
Date: Tue Oct 03 2023 - 08:27:41 EST




On 10/3/23 2:50 PM, Pierre Gondois wrote:
> Hello Shrikanth,
> Some NITs about the commit message:
>

Hi Pierre.


> On 9/29/23 17:52, Shrikanth Hegde wrote:
>> sysctl sched_energy_aware is available for the admin to disable/enable
>> energy aware scheduling(EAS). EAS is enabled only if few conditions are
>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>> schedutil CPUfreq governor, frequency invariant load tracking etc.
>> A platform may boot without EAS capability, but could gain such
>> capability at runtime For example, changing/registering the CPUfreq
>
> Missing dot I think: 'runtime. For example,'

ok.

>
>> governor to schedutil.
>>
>> At present, though platform doesn't support EAS, this sysctl returns 1
>> and it ends up calling build_perf_domains on write to 1 and
>> NOP when writing to 0. That is confusing and un-necessary.
>
This is current problematic behavior that patch 2/2 tries to address.

> I'm not sure I fully understand the sentence:
> - it sounds that the user is writing a value to either 1/0
>   (I think the user is writing 1/0 to the sysctl)

Yes, any user with root
privileges can edit this file and perform read and write.

> - aren't the sched domain rebuilt even when writing 0 to the sysctl ?
>   I'm not sure I understand to what the NOP is referring to exactly.
>

Complete sched domains aren't built as this case goes to match1 and match2 statements.

> What about:
> Platforms without EAS capability currently advertise this sysctl.
> Its effects (i.e. rebuilding sched-domains) is unnecessary on
> such platforms and its presence can be confusing.
>
look ok. the changelog had described in detail IMHO


>>
>> Desired behavior would be to, have this sysctl to enable/disable the EAS
>
> Unnecessary comma I think
>
>> on supported platform. On Non supported platform write to the sysctl
>
> Non supported  -> non-supported

ok for the above two nits.

>
>> would return not supported error and read of the sysctl would return
>> empty. So> sched_energy_aware returns empty - EAS is not possible at
>> this moment
>> This will include EAS capable platforms which have at least one EAS
>> condition false during startup, e.g. using a Performance CPUfreq governor
>
> Just a remark, using the performance governor is not exactly a condition
> disabling EAS, it is more 'not using the schedutil CPUfreq governor'
>

ok.

>> sched_energy_aware returns 0 - EAS is supported but disabled by admin.
>> sched_energy_aware returns 1 - EAS is supported and enabled.
>>
>> User can find out the reason why EAS is not possible by checking
>> info messages. sched_is_eas_possible returns true if the platform
>> can do EAS at this moment.
>>
>> Depends on [PATCH v5 1/2] sched/topology: Remove EM_MAX_COMPLEXITY limit
>> to be applied first.
>
> I think it's implied as the 2 patches are sent together.
>

yes. Did mention it explicitly since b4 mbox can try apply 2/2 first.
had run into similar issues recently.

> Otherwise:
> Tested-by: Pierre Gondois <pierre.gondois@xxxxxxx>
>
>>

Thank you very much for the testing it and providing the tag.

>> Signed-off-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxxxxxxx>
>> ---
>>   Documentation/admin-guide/sysctl/kernel.rst |   3 +-
>>   kernel/sched/topology.c                     | 112 +++++++++++++-------
>>   2 files changed, 76 insertions(+), 39 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst
>> b/Documentation/admin-guide/sysctl/kernel.rst
>> index cf33de56da27..d89ac2bd8dc4 100644
>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>> @@ -1182,7 +1182,8 @@ automatically on platforms where it can run
>> (that is,
>>   platforms with asymmetric CPU topologies and having an Energy
>>   Model available). If your platform happens to meet the
>>   requirements for EAS but you do not want to use it, change
>> -this value to 0.
>> +this value to 0. On Non-EAS platforms, write operation fails and
>> +read doesn't return anything.
>>
>>   task_delayacct
>>   ===============
>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>> index e0b9920e7e3e..a654d0186ac0 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -212,6 +212,70 @@ static unsigned int sysctl_sched_energy_aware = 1;
>>   static DEFINE_MUTEX(sched_energy_mutex);
>>   static bool sched_energy_update;
>>
>> +extern struct cpufreq_governor schedutil_gov;
>> +static bool sched_is_eas_possible(const struct cpumask *cpu_mask)
>> +{
>> +    bool any_asym_capacity = false;
>> +    struct cpufreq_policy *policy;
>> +    struct cpufreq_governor *gov;
>> +    int i;
>> +
>> +    /* EAS is enabled for asymmetric CPU capacity topologies. */
>> +    for_each_cpu(i, cpu_mask) {
>> +        if (per_cpu(sd_asym_cpucapacity, i)) {
>> +            any_asym_capacity = true;
>> +            break;
>> +        }
>> +    }
>> +    if (!any_asym_capacity) {
>> +        if (sched_debug()) {
>> +            pr_info("rd %*pbl: Checking EAS, CPUs do not have
>> asymmetric capacities\n",
>> +                cpumask_pr_args(cpu_mask));
>> +        }
>> +        return false;
>> +    }
>> +
>> +    /* EAS definitely does *not* handle SMT */
>> +    if (sched_smt_active()) {
>> +        if (sched_debug()) {
>> +            pr_info("rd %*pbl: Checking EAS, SMT is not supported\n",
>> +                cpumask_pr_args(cpu_mask));
>> +        }
>> +        return false;
>> +    }
>> +
>> +    if (!arch_scale_freq_invariant()) {
>> +        if (sched_debug()) {
>> +            pr_info("rd %*pbl: Checking EAS: frequency-invariant load
>> tracking not yet supported",
>> +                cpumask_pr_args(cpu_mask));
>> +        }
>> +        return false;
>> +    }
>> +
>> +    /* Do not attempt EAS if schedutil is not being used. */
>> +    for_each_cpu(i, cpu_mask) {
>> +        policy = cpufreq_cpu_get(i);
>> +        if (!policy) {
>> +            if (sched_debug()) {
>> +                pr_info("rd %*pbl: Checking EAS, cpufreq policy not
>> set for CPU: %d",
>> +                    cpumask_pr_args(cpu_mask), i);
>> +            }
>> +            return false;
>> +        }
>> +        gov = policy->governor;
>> +        cpufreq_cpu_put(policy);
>> +        if (gov != &schedutil_gov) {
>> +            if (sched_debug()) {
>> +                pr_info("rd %*pbl: Checking EAS, schedutil is
>> mandatory\n",
>> +                    cpumask_pr_args(cpu_mask));
>> +            }
>> +            return false;
>> +        }
>> +    }
>> +
>> +    return true;
>> +}
>> +
>>   void rebuild_sched_domains_energy(void)
>>   {
>>       mutex_lock(&sched_energy_mutex);
>> @@ -231,6 +295,15 @@ static int sched_energy_aware_handler(struct
>> ctl_table *table, int write,
>>           return -EPERM;
>>
>>       ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>> +    if (!sched_is_eas_possible(cpu_active_mask)) {
>> +        if (write) {
>> +            return -EOPNOTSUPP;
>> +        } else {
>> +            *lenp = 0;
>> +            return 0;
>> +        }
>> +    }
>> +
>>       if (!ret && write) {
>>           state = static_branch_unlikely(&sched_energy_present);
>>           if (state != sysctl_sched_energy_aware)
>> @@ -351,61 +424,24 @@ static void sched_energy_set(bool has_eas)
>>    *    4. schedutil is driving the frequency of all CPUs of the rd;
>>    *    5. frequency invariance support is present;
>>    */
>> -extern struct cpufreq_governor schedutil_gov;
>>   static bool build_perf_domains(const struct cpumask *cpu_map)
>>   {
>>       int i;
>>       struct perf_domain *pd = NULL, *tmp;
>>       int cpu = cpumask_first(cpu_map);
>>       struct root_domain *rd = cpu_rq(cpu)->rd;
>> -    struct cpufreq_policy *policy;
>> -    struct cpufreq_governor *gov;
>>
>>       if (!sysctl_sched_energy_aware)
>>           goto free;
>>
>> -    /* EAS is enabled for asymmetric CPU capacity topologies. */
>> -    if (!per_cpu(sd_asym_cpucapacity, cpu)) {
>> -        if (sched_debug()) {
>> -            pr_info("rd %*pbl: CPUs do not have asymmetric
>> capacities\n",
>> -                    cpumask_pr_args(cpu_map));
>> -        }
>> -        goto free;
>> -    }
>> -
>> -    /* EAS definitely does *not* handle SMT */
>> -    if (sched_smt_active()) {
>> -        pr_warn("rd %*pbl: Disabling EAS, SMT is not supported\n",
>> -            cpumask_pr_args(cpu_map));
>> -        goto free;
>> -    }
>> -
>> -    if (!arch_scale_freq_invariant()) {
>> -        if (sched_debug()) {
>> -            pr_warn("rd %*pbl: Disabling EAS: frequency-invariant
>> load tracking not yet supported",
>> -                cpumask_pr_args(cpu_map));
>> -        }
>> +    if (!sched_is_eas_possible(cpu_map))
>>           goto free;
>> -    }
>>
>>       for_each_cpu(i, cpu_map) {
>>           /* Skip already covered CPUs. */
>>           if (find_pd(pd, i))
>>               continue;
>>
>> -        /* Do not attempt EAS if schedutil is not being used. */
>> -        policy = cpufreq_cpu_get(i);
>> -        if (!policy)
>> -            goto free;
>> -        gov = policy->governor;
>> -        cpufreq_cpu_put(policy);
>> -        if (gov != &schedutil_gov) {
>> -            if (rd->pd)
>> -                pr_warn("rd %*pbl: Disabling EAS, schedutil is
>> mandatory\n",
>> -                        cpumask_pr_args(cpu_map));
>> -            goto free;
>> -        }
>> -
>>           /* Create the new pd and add it to the local list. */
>>           tmp = pd_init(i);
>>           if (!tmp)
>> --
>> 2.39.3
>>

will send out v6 with these changes to changelog and Tested-by tag.
will wait for a while to see if there are any concerns or comments.