Re: [RFC PATCH] sched/fair: Fix impossible migrate_util scenario in load balance

From: Vincent Guittot
Date: Fri Jul 21 2023 - 09:53:16 EST


Le vendredi 21 juil. 2023 à 11:57:11 (+0100), Qais Yousef a écrit :
> On 07/20/23 14:31, Vincent Guittot wrote:
>
> > I was trying to reproduce the behavior but I was failing until I
> > realized that this code path is used when the 2 groups are not sharing
> > their cache. Which topology do you use ? I thought that dynamiQ and
> > shares cache between all 8 cpus was the norm for arm64 embedded device
> > now
>
> Hmm good question. phantom domains didn't die which I think is what causing
> this. I can look if this is for a good reason or just historical artifact.
>
> >
> > Also when you say "the little cluster capacity is very small nowadays
> > (around 200 or less)", it is the capacity of 1 core or the cluster ?
>
> I meant one core. So in my case all the littles were busy except for one that
> was mostly idle and never pulled a task from mid where two tasks were stuck on
> a CPU there. And the logs I have added were showing me that the env->imbalance
> was on 150+ range but the task we pull was in the 350+ range.

I'm not able to reproduce your problem with v6.5-rc2 and without phantom domain,
which is expected because we share cache and weight is 1 so we use the path

if (busiest->group_weight == 1 || sds->prefer_sibling) {
/*
* When prefer sibling, evenly spread running tasks on
* groups.
*/
env->migration_type = migrate_task;
env->imbalance = sibling_imbalance(env, sds, busiest, local);
} else {

>
> I should have mentioned that I'm on 5.15 - sorry with Android it's hard to run
> mainline on products :( But this code as far as I can tell hasn't changed much.
>
> I can try to find something that runs mainline and reproduce there if you think
> my description of the problem is not clear or applicable.
>
>
> Thanks
>
> --
> Qais Yousef