Re: [PATCH v2] sched/uclamp: Avoid getting unreasonable ucalmp value when rq is idle

From: Qais Yousef
Date: Fri Jul 02 2021 - 07:54:27 EST


On 07/02/21 13:12, Peter Zijlstra wrote:
> On Wed, Jun 30, 2021 at 10:12:04PM +0800, Xuewen Yan wrote:
> > From: Xuewen Yan <xuewen.yan@xxxxxxxxxx>
> >
> > Now in uclamp_rq_util_with(), when the task != NULL, the uclamp_max as following:
> > uc_rq_max = rq->uclamp[UCLAMP_MAX].value;
> > uc_eff_max = uclamp_eff_value(p, UCLAMP_MAX);
> > uclamp_max = max{uc_rq_max, uc_eff_max};
> >
> > Consider the following scenario:
> > (1)the rq is idle, the uc_rq_max is last runnable task's UCLAMP_MAX;
> > (2)the p's uc_eff_max < uc_rq_max.
> >
> > As a result, the uclamp_max = uc_rq_max instead of uc_eff_max, it is unreasonable.
> >
> > The scenario often happens in find_energy_efficient_cpu(), when the task has smaller UCLAMP_MAX.
> >
> > When rq has UCLAMP_FLAG_IDLE flag, enqueuing the task will lift UCLAMP_FLAG_IDLE
> > and set the rq clamp as the task's via uclamp_idle_reset(). It doesn't need
> > to read the rq clamp. And it can also avoid the problems described above.
> >
> > Fixes: 9d20ad7dfc9a ("sched/uclamp: Add uclamp_util_with()")
> >
> > Signed-off-by: Xuewen Yan <xuewen.yan@xxxxxxxxxx>
>
> Valentin, Qais, can either of you write a Changelog/comment for this, I
> can't seem to make any sense of it.

Err, yeah I think I've been staring at uclamp for too long. It could be
clearer.

>
> Is this about wake-from-idle, where the first task's uclamp goes amis
> because the rq->uclamp values haven't been updated yet?

Yep. How about the below?

--->8---

sched/uclamp: Ignore max aggregation if rq is idle

When a task wakes up on an idle rq, uclamp_rq_util_with() would max
aggregate with rq value. But since there is no task enqueued yet, the
values are stale based on the last task that was running. When the new
task actually wakes up and enqueued, then the rq uclamp values should
reflect that of the newly woken up task effective uclamp values.

This is a problem particularly for uclamp_max because it default to
1024. If a task p with uclamp_max = 512 wakes up, then max aggregation
would ignore the capping that should apply when this task is enqueued,
which is wrong.

Fix that by ignoring max aggregation if the rq is idle since in that
case the effective uclamp value of the rq will be the ones of the task
that will wake up.

--->8---

Thanks

--
Qais Yousef