Re: [PATCH] sched/pelt: Fix task util_est update filtering

From: Dietmar Eggemann
Date: Fri Feb 19 2021 - 05:20:21 EST


On 16/02/2021 17:39, vincent.donnefort@xxxxxxx wrote:
> From: Vincent Donnefort <vincent.donnefort@xxxxxxx>
>
> Being called for each dequeue, util_est reduces the number of its updates
> by filtering out when the EWMA signal is different from the task util_avg
> by less than 1%. It is a problem for a sudden util_avg ramp-up. Due to the
> decay from a previous high util_avg, EWMA might now be close enough to
> the new util_avg. No update would then happen while it would leave
> ue.enqueued with an out-of-date value.

(1) enqueued[x-1] < ewma[x-1]

(2) diff(enqueued[x], ewma[x]) < 1024/100 && enqueued[x] < ewma[x] (*)

with ewma[x-1] == ewma[x]

(*) enqueued[x] must still be less than ewma[x] w/ default
UTIL_EST_FASTUP. Otherwise we would already 'goto done' (write the new
util_est) via the previous if condition.

>
> Taking into consideration the two util_est members, EWMA and enqueued for
> the filtering, ensures, for both, an up-to-date value.
>
> This is for now an issue only for the trace probe that might return the
> stale value. Functional-wise, it isn't (yet) a problem, as the value is
> always accessed through max(enqueued, ewma).

Yeah, I remember that the ue.enqueued plots looked weird in these
sections with stale ue.enqueued values.

> This problem has been observed using LISA's UtilConvergence:test_means on
> the sd845c board.

I ran the test a couple of times on my juno board and I never hit this
path (util_est_within_margin(last_ewma_diff) &&
!util_est_within_margin(last_enqueued_diff)) for a test task.

I can't see how this issue can be board specific? Does it happen
reliably on sd845c or is it just that it happens very, very occasionally?

I saw it a couple of times but always with a (non-test) tasks migrating
from one CPU to another.

> Signed-off-by: Vincent Donnefort <vincent.donnefort@xxxxxxx>

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>

[...]