Re: [PATCH] sched/pelt: avoid underestimate of task utilization

From: Qais Yousef
Date: Thu Nov 23 2023 - 11:46:09 EST


On 11/23/23 14:27, Lukasz Luba wrote:
>
>
> On 11/21/23 23:01, Qais Yousef wrote:
> > On 11/22/23 15:01, Vincent Guittot wrote:
> > > It has been reported that thread's util_est can significantly decrease as
> > > a result of sharing the CPU with other threads. The use case can be easily
> > > reproduced with a periodic task TA that runs 1ms and sleeps 100us.
> > > When the task is alone on the CPU, its max utilization and its util_est is
> > > around 888. If another similar task starts to run on the same CPU, TA will
> > > have to share the CPU runtime and its maximum utilization will decrease
> > > around half the CPU capacity (512) then TA's util_est will follow this new
> > > maximum trend which is only the result of sharing the CPU with others
> > > tasks. Such situation can be detected with runnable_avg wich is close or
> > > equal to util_avg when TA is alone but increases above util_avg when TA
> > > shares the CPU with other threads and wait on the runqueue.
> > >
> > > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > > ---
> >
> > So effectively if have two always running tasks on the same CPU their util_est
> > will converge to 1024 instead of 512 now, right?
> >
> > I guess this is more accurate, yes. And I think this will hit the same
> > limitation we can hit with uclamp_max. If for example there are two tasks that
> > are 650 if they run alone, they would appear as 1024 now (instead of 512) if
> > they share the CPU because combined running there will be no idle time on the
> > CPU and appear like always running tasks, I think.
>
> Well they might not converge to 1024. It will just prevent them to not
> drop the highest seen util_est on them before this contention happen.

Right, we just ignore the decay and hold on to the last seen value.

>
> >
> > If I got it right, I think this should not be a problem in practice because the
> > only reason these two tasks will be stuck on the same CPU is because the load
> > balancer can't do anything about it to spread them; which indicates the system
> > must be busy anyway. Once there's more idle time elsewhere, they should be
> > spread and converge to 650 again.
>
> It can be applicable for the real app. That chrome thread that I
> reported (which is ~950 util) drops it's util and util_est in some
> scenarios when there are some other tasks in the runqueue, because
> of some short sleeps. Than this situation attracts other tasks to
> migrate but next time when the big thread wakes-up it has to share
> the CPU and looses it's util_est (which was the last information
> that there was such big util on it).
>
> Those update moments when we go via util_est_update() code path
> are quite often in short time and don't fit into the PELT world,
> IMO. It's like asynchronous force-update to the util_est signal,
> not the same way wrt how slowly util is built. I think Peter
> had something like this impression, when he asked me of often
> and fast this update could be triggered that we loose the value...
>
> I would even dare to call this patch a fix and a candidate to
> stable-tree.

It seems a genuine fix yes. I am generally worried of the power impact of
util_est. But I tend to agree this is something worth considering for
a backport.


Cheers

--
Qais Yousef