Re: [PATCH v5 2/2] sched/topology: change behaviour of sysctl sched_energy_aware based on the platform

From: Shrikanth Hegde
Date: Wed Oct 04 2023 - 11:00:11 EST




On 10/4/23 4:57 PM, Valentin Schneider wrote:
> On 29/09/23 21:22, Shrikanth Hegde wrote:

Hi Valentin, Thanks for taking a look at this patchset.

>> +static bool sched_is_eas_possible(const struct cpumask *cpu_mask)
>> +{
>> + bool any_asym_capacity = false;
>> + struct cpufreq_policy *policy;
>> + struct cpufreq_governor *gov;
>> + int i;
>> +
>> + /* EAS is enabled for asymmetric CPU capacity topologies. */
>> + for_each_cpu(i, cpu_mask) {
>> + if (per_cpu(sd_asym_cpucapacity, i)) {
>
> Lockdep should complain here in the sysctl path - this is an RCU-protected
> pointer.
>
> rcu_access_pointer() should do since you're not dereferencing the pointer.

Yes. I did miss to catch that since mostly copied the snippets from build_perf_domains.

>
>> + any_asym_capacity = true;
>> + break;
>> + }
>> + }
>
>> @@ -231,6 +295,15 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
>> return -EPERM;
>>
>> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>
> Shouldn't this happen after we check sched_is_eas_possible()? Otherwise
> AFAICT a write can actually happen despite !sched_is_eas_possible().

Yes. That's right. Will change that.

>
>> + if (!sched_is_eas_possible(cpu_active_mask)) {
>> + if (write) {
>> + return -EOPNOTSUPP;
>> + } else {
>> + *lenp = 0;
>> + return 0;
>> + }
>> + }
>
> But now this is making me wonder, why not bite the bullet and store
> somewhere whether we ever managed to enable EAS? Something like so?
> (I didn't bother making this yet another static key given this is not a hot
> path at all)

IIUC, Problem with this is, a platform which can do EAS now, may not be able to do EAS
sometime later.
for example, frequency governor is changed from performance to
schedutil, EAS can be enabled and sched_energy_once will be set, but later it can be
set to performance again. In that case saying it is EAS capable is wrong.

> ---
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index e0b9920e7e3e4..abd950f434206 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -209,6 +209,7 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
> #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> DEFINE_STATIC_KEY_FALSE(sched_energy_present);
> static unsigned int sysctl_sched_energy_aware = 1;
> +static bool __read_mostly sched_energy_once;
> static DEFINE_MUTEX(sched_energy_mutex);
> static bool sched_energy_update;
>
> @@ -230,6 +231,15 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> if (write && !capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> + if (!sched_energy_once) {
> + if (write) {
> + return -EOPNOTSUPP;
> + } else {
> + *lenp = 0;
> + return 0;
> + }
> + }
> +
> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> if (!ret && write) {
> state = static_branch_unlikely(&sched_energy_present);
> @@ -340,6 +350,8 @@ static void sched_energy_set(bool has_eas)
> if (sched_debug())
> pr_info("%s: starting EAS\n", __func__);
> static_branch_enable_cpuslocked(&sched_energy_present);
> + // Record that we managed to enable EAS at least once
> + sched_energy_once = true;
> }
> }
>
>