Re: [PATCH v2 1/3] sched/pelt: Don't sync hardly util_sum with uti_avg

From: Dietmar Eggemann
Date: Wed Jan 05 2022 - 08:15:36 EST


On 04/01/2022 14:42, Vincent Guittot wrote:
> On Tue, 4 Jan 2022 at 12:47, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>
>> On 22/12/2021 10:38, Vincent Guittot wrote:

[...]

>> I still wonder whether the regression only comes from the changes in
>> update_cfs_rq_load_avg(), introduced by 1c35b07e6d39.
>> And could be fixed only by this part of the patch-set (A):
>
> I have been able to trigger the warning even with (A) though It took
> much more time.
> And I have been able to catch wrong situations (with additional
> traces) in the 3 places A, B and C

OK. By wrong situation you mean '_sum < _avg * MIN_DIVIDER' ?

>> @@ -3677,15 +3706,22 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq
>> *cfs_rq)
>>
>> r = removed_load;
>> sub_positive(&sa->load_avg, r);
>> - sa->load_sum = sa->load_avg * divider;
>> + sub_positive(&sa->load_sum, r * divider);
>> + sa->load_sum = max_t(u32, sa->load_sum, sa->load_avg * MIN_DIVIDER);
>>
>> r = removed_util;
>> sub_positive(&sa->util_avg, r);
>> - sa->util_sum = sa->util_avg * divider;
>> + sub_positive(&sa->util_sum, r * divider);
>> + sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);
>>
>> r = removed_runnable;
>> sub_positive(&sa->runnable_avg, r);
>> - sa->runnable_sum = sa->runnable_avg * divider;
>> + sub_positive(&sa->runnable_sum, r * divider);
>> + sa->runnable_sum = max_t(u32, sa->runnable_sum,
>> + sa->runnable_avg * MIN_DIVIDER);
>>
>> i.e. w/o changing update_tg_cfs_X() (and
>> detach_entity_load_avg()/dequeue_load_avg()).
>>
>> update_load_avg()
>> update_cfs_rq_load_avg() <---
>> propagate_entity_load_avg()
>> update_tg_cfs_X() <---
>>
>>
>> I didn't see the SCHED_WARN_ON() [cfs_rq_is_decayed()] when looping on
>> hackbench in several different sched group levels on
>> [Hikey620 (Arm64, 8 CPUs, SMP, 4 taskgroups: A/B C/D E/F G/H), >12h uptime].
>
> IIRC, it was with hikey960 with cgroup v1
> As a side note, I never trigger the problem with dragonboard845 and cgroup v2

OK, just started a test on hikey960 cgroupv1. Let's see if I can catch it.

[...]

>>> @@ -3780,7 +3799,11 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>>>
>>> dequeue_load_avg(cfs_rq, se);
>>> sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
>>> - cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
>>> + sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
>>> + /* See update_tg_cfs_util() */
>>> + cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
>>> + cfs_rq->avg.util_avg * MIN_DIVIDER);
>>> +
>>
>> Maybe add a:
>>
>> Fixes: fcf6631f3736 ("sched/pelt: Ensure that *_sum is always synced
>> with *_avg")
>
> I spent time thinking about adding fixes tag. There is no crash/warn
> so far so should we propagate it back in LTS for better performance ?

Not sure I understand. What do you mean by 'should we propagate it back
in LTS'?

[...]

>> This max_t() should make sure that `_sum is always >= _avg *
>> MIN_DIVIDER`. Which is not the case sometimes. Currently this is done in
>>
>> (1) update_cfs_rq_load_avg()
>> (2) detach_entity_load_avg() and dequeue_load_avg()
>> (3) update_tg_cfs_X()
>>
>> but not in attach_entity_load_avg(), enqueue_load_avg(). What's the
>> reason for this?
>
> Main reason is that I have never seen the problem.
> Then, the problem comes from subtracting task's value whereas here we
> always add positive value

OK, I see. The add_positive()'s in update_tg_cfs_X() deal w/ `long` values.