Re: [PATCH] sched/fair: schedutil: update only with all info available

From: Vincent Guittot
Date: Wed Apr 11 2018 - 02:58:37 EST


On 10 April 2018 at 13:04, Patrick Bellasi <patrick.bellasi@xxxxxxx> wrote:
> Hi Vincent,
>
> On 09-Apr 10:51, Vincent Guittot wrote:
>> Hi Patrick
>>
>> On 6 April 2018 at 19:28, Patrick Bellasi <patrick.bellasi@xxxxxxx> wrote:
>> > Schedutil is not properly updated when the first FAIR task wakes up on a
>> > CPU and when a RQ is (un)throttled. This is mainly due to the current
>> > integration strategy, which relies on updates being triggered implicitly
>> > each time a cfs_rq's utilization is updated.
>> >
>> > Those updates are currently provided (mainly) via
>> > cfs_rq_util_change()
>> > which is used in:
>> > - update_cfs_rq_load_avg()
>> > when the utilization of a cfs_rq is updated
>> > - {attach,detach}_entity_load_avg()
>> > This is done based on the idea that "we should callback schedutil
>> > frequently enough" to properly update the CPU frequency at every
>> > utilization change.
>> >
>> > Since this recent schedutil update:
>> >
>> > commit 8f111bc357aa ("cpufreq/schedutil: Rewrite CPUFREQ_RT support")
>> >
>> > we use additional RQ information to properly account for FAIR tasks
>> > utilization. Specifically, cfs_rq::h_nr_running has to be non-zero
>> > in sugov_aggregate_util() to sum up the cfs_rq's utilization.
>>
>> Isn't the use of cfs_rq::h_nr_running, the root cause of the problem ?
>
> Not really...
>
>> I can now see a lot a frequency changes on my hikey with this new
>> condition in sugov_aggregate_util().
>> With a rt-app UC that creates a periodic cfs task, I have a lot of
>> frequency changes instead of staying at the same frequency
>
> I don't remember a similar behavior... but I'll check better.

I have discovered this behavior quite recently while preparing OSPM

>
>> Peter,
>> what was your goal with adding the condition "if
>> (rq->cfs.h_nr_running)" for the aggragation of CFS utilization
>
> The original intent was to get rid of sched class flags, used to track
> which class has tasks runnable from within schedutil. The reason was
> to solve some misalignment between scheduler class status and
> schedutil status.

This was mainly for RT tasks but it was not the case for cfs task
before commit 8f111bc357aa

>
> The solution, initially suggested by Viresh, and finally proposed by
> Peter was to exploit RQ knowledges directly from within schedutil.
>
> The problem is that now schedutil updated depends on two information:
> utilization changes and number of RT and CFS runnable tasks.
>
> Thus, using cfs_rq::h_nr_running is not the problem... it's actually
> part of a much more clean solution of the code we used to have.

So there are 2 problems there:
- using cfs_rq::h_nr_running when aggregating cfs utilization which
generates a lot of frequency drop
- making sure that the nr-running are up-to-date when used in sched_util

>
> The problem, IMO is that we now depend on other information which
> needs to be in sync before calling schedutil... and the patch I
> proposed is meant to make it less likely that all the information
> required are not aligned (also in the future).
>
> --
> #include <best/regards.h>
>
> Patrick Bellasi