Re: [RFC PATCH 1/6] sched/uclamp: Track uclamped util_avg in sched_avg

From: Dietmar Eggemann
Date: Tue Nov 14 2023 - 07:59:10 EST


On 09/11/2023 17:05, Hongyan Xia wrote:
> On 31/10/2023 15:52, Dietmar Eggemann wrote:
>> On 04/10/2023 11:04, Hongyan Xia wrote:
>>> From: Hongyan Xia <hongyan.xia2@xxxxxxx>
>>
>> [...]
>>
>>> @@ -6445,6 +6450,21 @@ static int sched_idle_cpu(int cpu)
>>>   }
>>>   #endif
>>>   +void ___update_util_avg_uclamp(struct sched_avg *avg, struct
>>> sched_entity *se);
>>
>> IMHO, `struct sched_avg *avg` can only be the one of a cfs_rq. So
>> passing a cfs_rq would eliminate the question whether this can be from
>> another se.
>
> At the moment, yes, the avg can only come from cfs_rq. The reason why I
> kept sched_avg is that once we have sum aggregation for RT tasks, it's
> very likely we will end up calling this function on rt_rq->avg, so
> having just sched_avg here will work for RT in the future.

Ah, OK. IMHO would be better to use cfs_rq for now and widen this
interface once RT is covered.

[...]

>> Question for me is why can't you integrate the util_avg_uclamp signals
>> for se's and cfs_rq's/rq's much closer into existing PELT functions?
>
> So the problem is that when we enqueue the task (say UCLAMP_MIN of 200),
> at that exact moment se->on_rq is false, so we only decay and not
> enforce UCLAMP_MIN. Further up in the hierarchy we do update_load_avg(),
> but the se of the task has already been processed so UCLAMP_MIN has not
> taken any effect. To make sure UCLAMP_MIN is immediately effective, I
> just re-evaluate the whole hierarchy from bottom to top.
>
> I probably didn't quite catch what you said here. Could you elaborate a
> bit on what 'much closer' means?

I see. But can we not use DO_ATTACH (and DO_DETACH) for this?

(flags & DO_ATTACH) characterizes the enqueuing of the task. So with
DO_ATTACH (and so !se->on_rq (and cfs_rq->curr != se)) we could (a)
decay the task and (b) add it to the cfs_rq in enqueue_entity() ->
update_load_avg(..., | DO_ATTACH).

Like we do for PELT and a wakeup migration enqueuing (a)

update_load_avg()

__update_load_avg_se()

if (___update_load_sum(..., cfs_rq->curr == se)
^^^^^^^^^^^^^^^^^^
'int running' for utilization

___update_load_avg()

if (!se->avg.last_update_time && (flags & DO_ATTACH))
^^^^^^^^^^^^^^^^^^^^^^^^^
wakeup migration

attach_entity_load_avg()

Notice, for PELT we attach/detach to/from the cfs_rq which gives us
blocked contribution. For util_avg_clamped we do enqueue/dequeue, so
only runnable contribution.