Re: [PATCH v4] sched/topology: change behaviour of sysctl sched_energy_aware based on the platform

From: Pierre Gondois
Date: Thu Sep 28 2023 - 05:26:44 EST


Hello Shrikanth, Dietmar,

On 9/27/23 19:08, Shrikanth Hegde wrote:


On 9/27/23 6:44 PM, Dietmar Eggemann wrote:
Ah, BTW s/quentin.perret@xxxxxxx/qperret@xxxxxxxxxx

On 27/09/2023 10:14, Shrikanth Hegde wrote:


On 9/27/23 2:59 AM, Dietmar Eggemann wrote:
On 26/09/2023 12:00, Shrikanth Hegde wrote:

[...]

At present, though platform doesn't support EAS, this sysctl returns 1
and it ends up calling rebuild of sched domain on write to 1 and

sched domains are not rebuild in this case, i.e.
partition_sched_domains_locked() does not call detach_destroy_domains()
or build_sched_domains(). Only build_perf_domains() is called which
bails out if !sysctl_sched_energy_aware or one of the EAS conditions is
not true.


ok. that's because it goes to match1 and match2 right?

yes.

[...]

@@ -231,6 +289,14 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
return -EPERM;

ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+ if (!sched_is_eas_possible(cpu_active_mask)) {

This is using `cpu_active_mask` so EAS can only be disabled system-wide.

So I experimented with an exclusive cpuset setup creating a symmetric
(cs1) and an asymmetric (cs2) island on my Juno with its cpumask = [l B
B l l l] (l - little CPU, B - Big CPU).

root@juno:~# cd /sys/fs/cgroup/cpuset
root@juno:/sys/fs/cgroup/cpuset# mkdir cs1
root@juno:/sys/fs/cgroup/cpuset# echo 1 > cs1/cpuset.cpu_exclusive
root@juno:/sys/fs/cgroup/cpuset# echo 0 > cs1/cpuset.mems
root@juno:/sys/fs/cgroup/cpuset# echo 4,5 > cs1/cpuset.cpus
root@juno:/sys/fs/cgroup/cpuset# mkdir cs2
root@juno:/sys/fs/cgroup/cpuset# echo 1 > cs2/cpuset.cpu_exclusive
root@juno:/sys/fs/cgroup/cpuset# echo 0 > cs2/cpuset.mems
root@juno:/sys/fs/cgroup/cpuset# echo 0-3 > cs2/cpuset.cpus
root@juno:/sys/fs/cgroup/cpuset# echo 0 > cpuset.sched_load_balance

[ 3021.761278] root_domain 0-3: pd1:{ cpus=1-2 nr_pstate=5 } pd0:{
cpus=0,3-5 nr_pstate=5 }

root@juno:~# echo 0 > /proc/sys/kernel/sched_energy_aware

log messages:
...
[ 3143.538583] rd 4-5: Disabling EAS
[ 3143.549569] rd 0-3: Disabling EAS
[ 3143.560681] sched_energy_set: stopping EAS
...

root@juno:~# echo 1 > /proc/sys/kernel/sched_energy_aware

log messages:
...
[ 3223.277521] root_domain 0-3: pd1:{ cpus=1-2 nr_pstate=5 } pd0:{
cpus=0,3-5 nr_pstate=5 }
[ 3223.293409] sched_energy_set: starting EAS

Seems still to work correctly.

I see that can be a issue. using first cpu here check to asymmetric cpu capacity.
It would have worked here, since the first group had asymmetry. ( l B B l ).
It wouldn't have worked if the first group had like ( l l ) and second group has ( l B B l )

Yeah, that's true.

sched_is_eas_possible(const struct cpumask *cpu_mask)

!per_cpu(sd_asym_cpucapacity, cpumask_first(cpu_mask));

cpusets cs1=[0,5] and cs2=[1-4] as such an example.


right.


Instead of cpu_active_mask, I can make use of ndoms_cur and doms_cur[i] logic to
traverse through possible doms, and if none of the domains have asymmetric cpu capacity
return false. Does that look right?

rebuild_sched_domains()

rebuild_sched_domains_locked()

ndoms = generate_sched_domains(&doms, &attr)

You would need generate_sched_domains() in sched_energy_aware_handler()?


clearly I didnt think through this. ndoms_cur and doms_cur are updated at the end.
So If EAS is possible at boot, this would fail.


What sched_is_eas_possible needs is if there is asymmetric cpu topology.
Simpler loop of all CPU's and checking if there any of them have sd_asym_cpucapacity might do,
though it goes through all CPU's, Since these functions are not in hot path
So it should not affect any performance. Something like below would work?

bool any_asym_capacity = false;

/* EAS is enabled for asymmetric CPU capacity topologies. */
for_each_cpu(i, cpu_mask) {
if (per_cpu(sd_asym_cpucapacity, i)) {
any_asym_capacity = true;
break;
}
}
if (!any_asym_capacity) {
if (sched_debug()) {
pr_info("rd %*pbl: Checking EAS, CPUs do not have asymmetric capacities\n",
cpumask_pr_args(cpu_mask));
}
return false;
}


FYIW I could reproduce the issue mentioned above, and the suggested bit
seems to solve it.



[...]

@@ -458,6 +487,8 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
return !!pd;

free:
+ if (sched_debug())
+ pr_warn("rd %*pbl: Disabling EAS", cpumask_pr_args(cpu_map));

Shouldn't this be used in condition `if (!sysctl_sched_energy_aware)`?
Otherwise we have 2 warnings when the other conditions which leads to
`goto free` aren't met.
Since sched_energy_set has the info messages about start and stop of EAS, maybe
this debug is not needed. Will remove it. Doing it only `if (!sysctl_sched_energy_aware)`

OK.

also doesn't seem correct, as calling free in this function would free the perf_domains.

But !sched_is_eas_possible() also does `goto free` and in there we we
emit pr_info's indicating why EAS isn't possible right now.

When issuing a:

# echo 0 > /proc/sys/kernel/sched_energy_aware

we would see in the logs:

...
[ 416.325324] rd 0-5: sysctl_sched_energy_aware is N <-- (*)
[ 416.337844] sched_energy_set: stopping EAS
...

but maybe (*) is not necessary ...

I m not sure I understand 100% the point, but it seems to me that when
changing sysctl_sched_energy_aware's value, either:
- EAS is not possible, and sched_is_eas_possible() will output the reason why
(i.e. "Checking EAS, [...]")
- EAS was deactivated by the user, and it is possible to check the value of
sysctl_sched_energy_aware. So there would be no need to have an additional
message "Checking EAS, sysclt sched_energy_aware set to N"

When build_perf_domains() is called while rebuilding the sched domains, it is also
possible to check sched_energy_aware's value and understand why EAS is not enabled.


ok. I think we can add similar info message in if(!sysctl_sched_energy_aware) like below?
Would that be enough?

if (!sysctl_sched_energy_aware) {
if (sched_debug()) {
pr_info("rd %*pbl: Checking EAS, sysclt sched_energy_aware set to N\n",
cpumask_pr_args(cpu_map));
}

(No need for brackets here, just in case)

goto free;
}

and remove the below one.
if (sched_debug())
pr_warn("rd %*pbl: Disabling EAS", cpumask_pr_args(cpu_map));


So there will be these "Checking EAS" and if possible, "starting EAS" or "stopping EAS" message.


I will rebase the patch removing the EM complexity and check it/resend it,
like this the 2 patches could go together:
https://lore.kernel.org/lkml/20221121094336.3250917-1-pierre.gondois@xxxxxxx/

Regards,
Pierre