Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime

From: Wei Wang
Date: Wed Oct 05 2022 - 12:57:39 EST


On Tue, Oct 4, 2022 at 2:33 AM Dietmar Eggemann
<dietmar.eggemann@xxxxxxx> wrote:
>
> Hi Wei,
>
> On 04/10/2022 00:57, Wei Wang wrote:
>
> Please don't do top-posting.
>

Sorry, forgot this was posted to the list...

> > We have some data on an earlier build of Pixel 6a, which also runs a
> > slightly modified "sched" governor. The tuning definitely has both
> > performance and power impact on UX. With some additional user space
> > hints such as ADPF (Android Dynamic Performance Framework) and/or the
> > old-fashioned INTERACTION power hint, different trade-offs can be
> > archived with this sort of tuning.
> >
> >
> > +---------------------------------------------------------+----------+----------+
> > | Metrics | 32ms |
> > 8ms |
> > +---------------------------------------------------------+----------+----------+
> > | Sum of gfxinfo_com.android.test.uibench_deadline_missed | 185.00 |
> > 112.00 |
> > | Sum of SFSTATS_GLOBAL_MISSEDFRAMES | 62.00 |
> > 49.00 |
> > | CPU Power | 6,204.00 |
> > 7,040.00 |
> > | Sum of Gfxinfo.frame.95th | 582.00 |
> > 506.00 |
> > | Avg of Gfxinfo.frame.95th | 18.19 |
> > 15.81 |
> > +---------------------------------------------------------+----------+----------+
>
> Which App is package `gfxinfo_com.android.test`? Is this UIBench? Never
> ran it.
>

Yes.

> I'm familiar with `dumpsys gfxinfo <PACKAGE_NAME>`.
>
> # adb shell dumpsys gfxinfo <PACKAGE_NAME>
>
> ...
> ** Graphics info for pid XXXX [<PACKAGE_NAME>] **
> ...
> 95th percentile: XXms <-- (a)
> ...
> Number Frame deadline missed: XX <-- (b)
> ...
>
>
> I assume that `Gfxinfo.frame.95th` is related to (a) and
> `gfxinfo_com.android.test.uibench_deadline_missed` to (b)? Not sure
> where `SFSTATS_GLOBAL_MISSEDFRAMES` is coming from?
>

a) is correct b) is from surfaceflinger. Android display pipeline
involves both a) app (generation) and b) surfaceflinger
(presentation).

> What's the Sum here? Is it that you ran the test 32 times (582/18.19 = 32)?
>

Uibench[1] has several micro tests and it is the sum of those tests.


[1]: https://cs.android.com/android/platform/superproject/+/master:platform_testing/tests/microbenchmarks/uibench/src/com/android/uibench/microbenchmark/


> [...]
>
> > On Thu, Sep 29, 2022 at 11:59 PM Kajetan Puchalski
> > <kajetan.puchalski@xxxxxxx> wrote:
> >>
> >> On Thu, Sep 29, 2022 at 01:21:45PM +0200, Peter Zijlstra wrote:
> >>> On Thu, Sep 29, 2022 at 12:10:17PM +0100, Kajetan Puchalski wrote:
> >>>
> >>>> Overall, the problem being solved here is that based on our testing the
> >>>> PELT half life can occasionally be too slow to keep up in scenarios
> >>>> where many frames need to be rendered quickly, especially on high-refresh
> >>>> rate phones and similar devices.
> >>>
> >>> But it is a problem of DVFS not ramping up quick enough; or of the
> >>> load-balancer not reacting to the increase in load, or what aspect
> >>> controlled by PELT is responsible for the improvement seen?
> >>
> >> Based on all the tests we've seen, jankbench or otherwise, the
> >> improvement can mainly be attributed to the faster ramp up of frequency
> >> caused by the shorter PELT window while using schedutil. Alongside that
> >> the signals rising faster also mean that the task would get migrated
> >> faster to bigger CPUs on big.LITTLE systems which improves things too
> >> but it's mostly the frequency aspect of it.
> >>
> >> To establish that this benchmark is sensitive to frequency I ran some
> >> tests using the 'performance' cpufreq governor.
> >>
> >> Max frame duration (ms)
> >>
> >> +------------------+-------------+----------+
> >> | kernel | iteration | value |
> >> |------------------+-------------+----------|
> >> | pelt_1 | 10 | 157.426 |
> >> | pelt_4 | 10 | 85.2713 |
> >> | performance | 10 | 40.9308 |
> >> +------------------+-------------+----------+
> >>
> >> Mean frame duration (ms)
> >>
> >> +---------------+------------------+---------+-------------+
> >> | variable | kernel | value | perc_diff |
> >> |---------------+------------------+---------+-------------|
> >> | mean_duration | pelt_1 | 14.6 | 0.0% |
> >> | mean_duration | pelt_4 | 14.5 | -0.58% |
> >> | mean_duration | performance | 4.4 | -69.75% |
> >> +---------------+------------------+---------+-------------+
> >>
> >> Jank percentage
> >>
> >> +------------+------------------+---------+-------------+
> >> | variable | kernel | value | perc_diff |
> >> |------------+------------------+---------+-------------|
> >> | jank_perc | pelt_1 | 2.1 | 0.0% |
> >> | jank_perc | pelt_4 | 2 | -3.46% |
> >> | jank_perc | performance | 0.1 | -97.25% |
> >> +------------+------------------+---------+-------------+
> >>
> >> As you can see, bumping up frequency can hugely improve the results
> >> here. This is what's happening when we decrease the PELT window, just on
> >> a much smaller and not as drastic scale. It also explains specifically
> >> where the increased power usage is coming from.
>