Re: [PATCH v4 1/2] sched/fair: Check a task has a fitting cpu when updating misfit

From: Vincent Guittot
Date: Wed Jan 31 2024 - 08:55:53 EST


On Wed, 31 Jan 2024 at 00:57, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
>
> On 01/30/24 10:41, Vincent Guittot wrote:
> > On Mon, 29 Jan 2024 at 00:50, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> > >
> > > On 01/26/24 15:08, Vincent Guittot wrote:
> > >
> > > > > TBH I had a bit of confirmation bias that this is a problem based on the fix
> > > > > (0ae78eec8aa6) that we had in the past. So on verification I looked at
> > > > > balance_interval and this reproducer which is a not the same as the original
> > > > > one and it might be exposing another problem and I didn't think twice about it.
> > > >
> > > > I checked the behavior more deeply and I confirm that I don't see
> > > > improvement for the use case described above. I would say that it's
> > > > even worse as I can see some runs where the task stays on little
> > > > whereas a big core has been added in the affinity. Having in mind that
> > > > my system is pretty idle which means that there is almost no other
> > > > reason to trigger an ilb than the misfit task, the change in
> > > > check_misfit_status() is probably the reason for never kicking an ilb
> > > > for such case
> > >
> > > It seems I reproduced another problem while trying to reproduce the original
> > > issue, eh.
> > >
> > > I did dig more and from what I see the issue is that the rd->overload is not
> > > being set correctly. Which I believe what causes the delays (see attached
> > > picture how rd.overloaded is 0 with some spikes). Only when CPU7
> > > newidle_balance() coincided with rd->overload being 1 that the migration
> > > happens. With the below hack I can see that rd->overload is 1 all the time
> >
> > But here you rely on another activity happening in CPU7 whereas the
>
> I don't want to rely on that. I think this is a problem too. And this is what
> ends up happening from what I see, sometimes at least.
>
> When is it expected for newidle_balance to pull anyway? I agree we shouldn't
> rely on it to randomly happen, but if it happens sooner, it should pull, no?
>
> > misfit should trigger by itself the load balance and not expect
> > another task waking up then sleeping on cpu7 to trigger a newidle
> > balance. We want a normal idle load balance not a newidle_balance
>
> I think there's a terminology problems. I thought you mean newidle_balnce() by
> ilb. It seems you're referring to load_balance() called from
> rebalance_domains() when tick happens at idle?

newidle_balance is different from idle load balance. newidle_balance
happens when the cpu becomes idle whereas busy and idle load balance
happen at tick.

>
> I thought this is not kicking. But I just double checked in my traces and I was
> getting confused because I was looking at where run_rebalance_domains() would
> happen, for example, on CPU2 but the balance would actually be for CPU7.

An idle load balance happens either on the target CPU if its tick is
not stopped or we kick one idle CPU to run the idle load balance on
behalf of all idle CPUs. This is the latter case that doesn't happen
anymore with your patch and the change in check_misfit_status.

>
> No clue why it fails to pull.. I can see actually we call load_balance() twice
> for some (not all) entries to rebalance_domains(). So we don't always operate
> on the two domains. But that's not necessarily a problem.

We have 3 different reasons for kicking an idle load balance :
- to do an actual balance of tasks
- to update stats ie blocked load
- to update nohz.next_balance

You are interested by the 1st one but it's most probably for the 2
last reasons that this happen

>
> I think it's a good opportunity to add some tracepoints to help break this path
> down. If you have suggestions of things to record that'd be helpful.