Re: [PATCH v4] sched/topology: change behaviour of sysctl sched_energy_aware based on the platform

From: Shrikanth Hegde
Date: Wed Sep 27 2023 - 13:08:42 EST

Next message: pr-tracker-bot: "Re: [GIT PULL] VFIO fix for v6.6-rc4"
Previous message: Rob Herring: "Re: [PATCH 3/3] dt-bindings: PCI: brcm,iproc-pcie: Fix 'msi' child node schema"
In reply to: Dietmar Eggemann: "Re: [PATCH v4] sched/topology: change behaviour of sysctl sched_energy_aware based on the platform"
Next in thread: Pierre Gondois: "Re: [PATCH v4] sched/topology: change behaviour of sysctl sched_energy_aware based on the platform"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 9/27/23 6:44 PM, Dietmar Eggemann wrote:
> Ah, BTW s/quentin.perret@xxxxxxx/qperret@xxxxxxxxxx
>
> On 27/09/2023 10:14, Shrikanth Hegde wrote:
>>
>>
>> On 9/27/23 2:59 AM, Dietmar Eggemann wrote:
>>> On 26/09/2023 12:00, Shrikanth Hegde wrote:
>
> [...]
>
>>>> At present, though platform doesn't support EAS, this sysctl returns 1
>>>> and it ends up calling rebuild of sched domain on write to 1 and
>>>
>>> sched domains are not rebuild in this case, i.e.
>>> partition_sched_domains_locked() does not call detach_destroy_domains()
>>> or build_sched_domains(). Only build_perf_domains() is called which
>>> bails out if !sysctl_sched_energy_aware or one of the EAS conditions is
>>> not true.
>>>
>>
>> ok. that's because it goes to match1 and match2 right?
>
> yes.
>
> [...]
>
>>>> @@ -231,6 +289,14 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
>>>> return -EPERM;
>>>>
>>>> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>>>> + if (!sched_is_eas_possible(cpu_active_mask)) {
>>>
>>> This is using `cpu_active_mask` so EAS can only be disabled system-wide.
>>>
>>> So I experimented with an exclusive cpuset setup creating a symmetric
>>> (cs1) and an asymmetric (cs2) island on my Juno with its cpumask = [l B
>>> B l l l] (l - little CPU, B - Big CPU).
>>>
>>> root@juno:~# cd /sys/fs/cgroup/cpuset
>>> root@juno:/sys/fs/cgroup/cpuset# mkdir cs1
>>> root@juno:/sys/fs/cgroup/cpuset# echo 1 > cs1/cpuset.cpu_exclusive
>>> root@juno:/sys/fs/cgroup/cpuset# echo 0 > cs1/cpuset.mems
>>> root@juno:/sys/fs/cgroup/cpuset# echo 4,5 > cs1/cpuset.cpus
>>> root@juno:/sys/fs/cgroup/cpuset# mkdir cs2
>>> root@juno:/sys/fs/cgroup/cpuset# echo 1 > cs2/cpuset.cpu_exclusive
>>> root@juno:/sys/fs/cgroup/cpuset# echo 0 > cs2/cpuset.mems
>>> root@juno:/sys/fs/cgroup/cpuset# echo 0-3 > cs2/cpuset.cpus
>>> root@juno:/sys/fs/cgroup/cpuset# echo 0 > cpuset.sched_load_balance
>>>
>>> [ 3021.761278] root_domain 0-3: pd1:{ cpus=1-2 nr_pstate=5 } pd0:{
>>> cpus=0,3-5 nr_pstate=5 }
>>>
>>> root@juno:~# echo 0 > /proc/sys/kernel/sched_energy_aware
>>>
>>> log messages:
>>> ...
>>> [ 3143.538583] rd 4-5: Disabling EAS
>>> [ 3143.549569] rd 0-3: Disabling EAS
>>> [ 3143.560681] sched_energy_set: stopping EAS
>>> ...
>>>
>>> root@juno:~# echo 1 > /proc/sys/kernel/sched_energy_aware
>>>
>>> log messages:
>>> ...
>>> [ 3223.277521] root_domain 0-3: pd1:{ cpus=1-2 nr_pstate=5 } pd0:{
>>> cpus=0,3-5 nr_pstate=5 }
>>> [ 3223.293409] sched_energy_set: starting EAS
>>>
>>> Seems still to work correctly.
>>
>> I see that can be a issue. using first cpu here check to asymmetric cpu capacity.
>> It would have worked here, since the first group had asymmetry. ( l B B l ).
>> It wouldn't have worked if the first group had like ( l l ) and second group has ( l B B l )
>
> Yeah, that's true.
>
> sched_is_eas_possible(const struct cpumask *cpu_mask)
>
> !per_cpu(sd_asym_cpucapacity, cpumask_first(cpu_mask));
>
> cpusets cs1=[0,5] and cs2=[1-4] as such an example.
>

right.

>
>> Instead of cpu_active_mask, I can make use of ndoms_cur and doms_cur[i] logic to
>> traverse through possible doms, and if none of the domains have asymmetric cpu capacity
>> return false. Does that look right?
>
> rebuild_sched_domains()
>
> rebuild_sched_domains_locked()
>
> ndoms = generate_sched_domains(&doms, &attr)
>
> You would need generate_sched_domains() in sched_energy_aware_handler()?
>

clearly I didnt think through this. ndoms_cur and doms_cur are updated at the end.
So If EAS is possible at boot, this would fail.

What sched_is_eas_possible needs is if there is asymmetric cpu topology.
Simpler loop of all CPU's and checking if there any of them have sd_asym_cpucapacity might do,
though it goes through all CPU's, Since these functions are not in hot path
So it should not affect any performance. Something like below would work?

bool any_asym_capacity = false;

/* EAS is enabled for asymmetric CPU capacity topologies. */
for_each_cpu(i, cpu_mask) {
if (per_cpu(sd_asym_cpucapacity, i)) {
any_asym_capacity = true;
break;
}
}
if (!any_asym_capacity) {
if (sched_debug()) {
pr_info("rd %*pbl: Checking EAS, CPUs do not have asymmetric capacities\n",
cpumask_pr_args(cpu_mask));
}
return false;
}

>>> [...]
>>>
>>>> @@ -458,6 +487,8 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
>>>> return !!pd;
>>>>
>>>> free:
>>>> + if (sched_debug())
>>>> + pr_warn("rd %*pbl: Disabling EAS", cpumask_pr_args(cpu_map));
>>>
>>> Shouldn't this be used in condition `if (!sysctl_sched_energy_aware)`?
>>> Otherwise we have 2 warnings when the other conditions which leads to
>>> `goto free` aren't met.
>> Since sched_energy_set has the info messages about start and stop of EAS, maybe
>> this debug is not needed. Will remove it. Doing it only `if (!sysctl_sched_energy_aware)`
>
> OK.
>
>> also doesn't seem correct, as calling free in this function would free the perf_domains.
>
> But !sched_is_eas_possible() also does `goto free` and in there we we
> emit pr_info's indicating why EAS isn't possible right now.
>
> When issuing a:
>
> # echo 0 > /proc/sys/kernel/sched_energy_aware
>
> we would see in the logs:
>
> ...
> [ 416.325324] rd 0-5: sysctl_sched_energy_aware is N <-- (*)
> [ 416.337844] sched_energy_set: stopping EAS
> ...
>
> but maybe (*) is not necessary ...

ok. I think we can add similar info message in if(!sysctl_sched_energy_aware) like below?
Would that be enough?

if (!sysctl_sched_energy_aware) {
if (sched_debug()) {
pr_info("rd %*pbl: Checking EAS, sysclt sched_energy_aware set to N\n",
cpumask_pr_args(cpu_map));
}
goto free;
}

and remove the below one.
if (sched_debug())
pr_warn("rd %*pbl: Disabling EAS", cpumask_pr_args(cpu_map));

So there will be these "Checking EAS" and if possible, "starting EAS" or "stopping EAS" message.

Next message: pr-tracker-bot: "Re: [GIT PULL] VFIO fix for v6.6-rc4"
Previous message: Rob Herring: "Re: [PATCH 3/3] dt-bindings: PCI: brcm,iproc-pcie: Fix 'msi' child node schema"
In reply to: Dietmar Eggemann: "Re: [PATCH v4] sched/topology: change behaviour of sysctl sched_energy_aware based on the platform"
Next in thread: Pierre Gondois: "Re: [PATCH v4] sched/topology: change behaviour of sysctl sched_energy_aware based on the platform"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]