Re: [RFC PATCH] sched/fair: Fix impossible migrate_util scenario in load balance

From: Dietmar Eggemann
Date: Mon Jul 24 2023 - 09:01:14 EST


On 22/07/2023 00:04, Qais Yousef wrote:
> On 07/21/23 15:52, Vincent Guittot wrote:
>> Le vendredi 21 juil. 2023 à 11:57:11 (+0100), Qais Yousef a écrit :
>>> On 07/20/23 14:31, Vincent Guittot wrote:
>>>
>>>> I was trying to reproduce the behavior but I was failing until I
>>>> realized that this code path is used when the 2 groups are not sharing
>>>> their cache. Which topology do you use ? I thought that dynamiQ and
>>>> shares cache between all 8 cpus was the norm for arm64 embedded device
>>>> now
>>>
>>> Hmm good question. phantom domains didn't die which I think is what causing
>>> this. I can look if this is for a good reason or just historical artifact.
>>>
>>>>
>>>> Also when you say "the little cluster capacity is very small nowadays
>>>> (around 200 or less)", it is the capacity of 1 core or the cluster ?
>>>
>>> I meant one core. So in my case all the littles were busy except for one that
>>> was mostly idle and never pulled a task from mid where two tasks were stuck on
>>> a CPU there. And the logs I have added were showing me that the env->imbalance
>>> was on 150+ range but the task we pull was in the 350+ range.
>>
>> I'm not able to reproduce your problem with v6.5-rc2 and without phantom domain,
>> which is expected because we share cache and weight is 1 so we use the path
>>
>> if (busiest->group_weight == 1 || sds->prefer_sibling) {
>> /*
>> * When prefer sibling, evenly spread running tasks on
>> * groups.
>> */
>> env->migration_type = migrate_task;
>> env->imbalance = sibling_imbalance(env, sds, busiest, local);
>> } else {
>>
>
> I missed the deps on topology. So yes you're right, this needs to be addressed
> first. I seem to remember Sudeep merged some stuff that will flatten these
> topologies.
>
> Let me chase this topology thing out first.

Sudeeps patches align topology cpumasks with cache cpumasks.

tip/sched/core:

root@juno:~# cat /sys/devices/system/cpu/cpu*/topology/package_cpus
3f
3f
3f
3f
3f
3f

v5.9:

root@juno:~# cat /sys/devices/system/cpu/cpu*/topology/package_cpus
39
06
06
39
39
39

So Android userspace won't be able to detect uArch boundaries via
`package_cpus` any longer.

The phantom domain (DIE) in Android is a legacy decision from within
Android. Pre-mainline Energy Model was attached to the sched domain
topology hierarchy. And then IMHO other Android functionality start to
rely on this. It could be removed regardless of Sudeeps patches in case
Android would be OK with it.

The phantom domain is probably set up via DT cpu_map entry:

cpu-map {
cluster0 { <-- enforce phantom domain
core0 {
cpu = <&CPU0>;
};
...
core3 {
cpu = <&CPU1>;
};
cluster1 {
...

Juno (arch/arm64/boot/dts/arm/juno.dts) also uses cpu-map to enforce
uarch boundaries on DIE group boundary congruence.

tip/sched/core:

# cat /proc/schedstat | awk '{ print $1 " " $2}' | head -5
...
cpu0 0
domain0 39
domain1 3f

v5.9:

# cat /proc/schedstat | awk '{ print $1 " " $2}' | head -5
...
cpu0 0
domain0 39
domain1 3f

We had a talk at LPC '22 about the influence of the patch-set and the
phantom domain legacy issue:

https://lpc.events/event/16/contributions/1342/attachments/962/1883/LPC-2022-Android-MC-Phantom-Domains.pdf

[...]