Re: [PATCH] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

From: Shrikanth Hegde
Date: Wed Aug 30 2023 - 15:29:38 EST




On 8/30/23 6:46 PM, Chen Yu wrote:
> Hi Shrikanth,
>
> On 2023-08-29 at 12:20:40 +0530, Shrikanth Hegde wrote:
>> Currently sysctl_sched_energy_aware doesn't alter the said behaviour on
>> some of the architectures. IIUC its meant to either force rebuild the
>> perf domains or cleanup the perf domains by echoing 1 or 0 respectively.
>>
>> perf domains are not built when there is SMT, or when there is no
>> Asymmetric CPU topologies or when there is no frequency invariance.
>> Since such cases EAS is not set and perf domains are not built. By
>> changing the values of sysctl_sched_energy_aware, its not possible to
>> force build the perf domains. Hence remove this sysctl on such platforms
>> that dont support it at boot. Some of the settings can be changed later
>> such as smt_active by offlining the CPU's, In those cases
>> build_perf_domains returns true, in that case re-enable the sysctl.
>>
>> Signed-off-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxxxxxxx>
>>
>
> [snip...]
>
>> @@ -380,15 +400,11 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
>> struct cpufreq_policy *policy;
>> struct cpufreq_governor *gov;
>>
>> - if (!sysctl_sched_energy_aware)
>> - goto free;
>> -
>
> I tried to ramp up the EAS and maybe I overlooked it, why do we remove above
> check? If the system boots with EAS enabled and sysctl_sched_energy_aware
> set to 1, then the user wants to disable EAS then changing sysctl_sched_energy_aware
> to 0. Without above check, how could the perf domain be freed?
>

Thank you very much Chen Yu, for taking a look. I had missed that case.

The reason for removing it was, conditions which are checked while doing
build_perf_domains can be changed after system boot. like by off-lining the SMT
siblings, sched_smt_active can be false, it is possible in future for the other
conditions to be false as well. In that case, having the sysctl check wouldn't
allow it to build the perf domains.

Since any change in this sysctl, will set sched_energy_update. That can be used
in combination to detect whether change is coming due to sysctl change or due
to system becoming capable of doing perf domains.


Something like below would work.

+ if (!sysctl_sched_energy_aware && sched_energy_update)
+ goto free;

This will introduce another issue, that it will remove the sysctl after setting
it to 0. Have taken care of that with below code change.


@@ -349,7 +349,16 @@ static void sched_energy_set(bool has_eas)
pr_info("%s: stopping EAS\n", __func__);
static_branch_disable_cpuslocked(&sched_energy_present);
#ifdef CONFIG_PROC_SYSCTL
- unregister_sysctl_table(sysctl_eas_header);
+ /*
+ * if the architecture supports EAS and forcefully
+ * perf domains are destroyed, there should be a sysctl
+ * to eanble it later. If this was due to dynamic system
+ * change such as SMT<->NON_SMT then remove sysctl.
+ */
+ if (sysctl_eas_header && !sched_energy_update) {
+ unregister_sysctl_table(sysctl_eas_header);
+ sysctl_eas_header = NULL;
+ }
#endif
sysctl_sched_energy_aware = 0;
} else if (has_eas && !static_branch_unlikely(&sched_energy_present)) {
@@ -357,7 +366,8 @@ static void sched_energy_set(bool has_eas)
pr_info("%s: starting EAS\n", __func__);
static_branch_enable_cpuslocked(&sched_energy_present);
#ifdef CONFIG_PROC_SYSCTL
- sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
+ if (!sysctl_eas_header)
+ sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
#endif
sysctl_sched_energy_aware = 1;
}



Different Cases:

Case1. system while booting has EAS capability, sysctl will be set 1. Hence
perf domains will be built On changing the sysctl to 0, above condition will be
true and hence it would be freed and sysctl will not be removed. later sysctl
is changed to 1, enabling the perf domains rebuilt again. Since sysctl is
already there, it will skip register.

Case2. System while booting doesn't have EAS Capability. Later after system
change it becomes capable of EAS. sched_energy_update is false. Hence though
sysctl is 0, will go ahead and try to enable eas. if has_eas is true, then
sysctl will be registered. After that it any sysctl change is same as Case1.

Case3. System becomes not capable of EAS due to system change. Here since
sched_energy_update is false. build_perf_domains return has_eas as false and
Since this is dynamic change remove the sysctl.


A bit convoluted because the design here, is considering dynamically system
becoming capable of EAS. Will send out a V2 with the above change after some
testing (have to try with HACK since EAS is by default not enabled on power10)