Re: [PATCH v4 1/2] sched/fair: Check a task has a fitting cpu when updating misfit

From: Qais Yousef
Date: Wed Jan 24 2024 - 17:30:11 EST


On 01/23/24 09:26, Vincent Guittot wrote:
> On Fri, 5 Jan 2024 at 23:20, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> >
> > From: Qais Yousef <qais.yousef@xxxxxxx>
> >
> > If a misfit task is affined to a subset of the possible cpus, we need to
> > verify that one of these cpus can fit it. Otherwise the load balancer
> > code will continuously trigger needlessly leading the balance_interval
> > to increase in return and eventually end up with a situation where real
> > imbalances take a long time to address because of this impossible
> > imbalance situation.
>
> If your problem is about increasing balance_interval, it would be
> better to not increase the interval is such case.
> I mean that we are able to detect misfit_task conditions for the
> periodic load balance so we should be able to not increase the
> interval in such cases.
>
> If I'm not wrong, your problem only happens when the system is
> overutilized and we have disable EAS

Yes and no. There are two concerns here:

1.

So this patch is a generalized form of 0ae78eec8aa6 ("sched/eas: Don't update
misfit status if the task is pinned") which is when I originally noticed the
problem and this patch was written along side it.

We have unlinked misfit from overutilized since then.

And to be honest I am not sure if flattening of topology matters too since
I first noticed this, which was on Juno which doesn't have flat topology.

FWIW I can still reproduce this, but I have a different setup now. On M1 mac
mini if I spawn a busy task affined to littles then expand the mask for
a single big core; I see big delays (>500ms) without the patch. But with the
patch it moves in few ms. The delay without the patch is too large and I can't
explain it. So the worry here is that generally misfit migration not happening
fast enough due to this fake misfit cases.

I did hit issues where with this patch I saw big delays sometimes. I have no
clue why this happens. So there are potentially more problems to chase.

My expectations that newidle balance should be able to pull misfit regardless
of balance_interval. So the system has to be really busy or really quite to
notice delays. I think prior to flat topology this pull was not guaranteed, but
with flat topology it should happen.

On this system if I expand the mask to all cpus (instead of littles + single
big), the issue is not as easy to reproduce, but I captured 35+ms delays
- which is long if this task was carrying important work and needs to
upmigrate. I thought newidle balance is more likely to pull it sooner, but I am
not 100% sure.

It's a 6.6 kernel I am testing with.

2.

Here yes the concern is that when we are overutilized and load balance is
required, this unnecessarily long delay can cause potential problems.


Cheers

--
Qais Yousef