Re: [PATCH] PM / EM: Inefficient OPPs detection

From: Vincent Donnefort
Date: Thu Apr 22 2021 - 11:36:26 EST


> > As used in the hot-path, the efficient table is a lookup table, generated
> > dynamically when the perf domain is created. The complexity of searching
> > a performance state is hence changed from O(n) to O(1). This also
> > speeds-up em_cpu_energy() even if no inefficient OPPs have been found.
>
> Interesting. Do you have measurements showing the benefits on wake-up
> duration? I remember doing so by hacking the wake-up path to force tasks
> into feec()/compute_energy() even when overutilized, and then running
> hackbench. Maybe something like that would work for you?
>
> Just want to make sure we actually need all that complexity -- while
> it's good to reduce the asymptotic complexity, we're looking at a rather
> small problem (max 30 OPPs or so I expect?), so other effects may be
> dominating. Simply skipping inefficient OPPs could be implemented in a
> much simpler way I think.
>
> Thanks,
> Quentin

On the Pixel4, I used rt-app to generate a task whom duty cycle is getting
higher for each phase. Then for each rt-app task placement, I measured how long
find_energy_efficient_cpu() took to run. I repeated the operation several
times to increase the count. Here's what I've got:

┌────────┬─────────────┬───────┬────────────────┬───────────────┬───────────────┐
│ Phase │ duty-cycle │ CPU │ w/o LUT │ w/ LUT │ │
│ │ │ ├────────┬───────┼───────┬───────┤ Diff │
│ │ │ │ Mean │ count │ Mean │ count │ │
├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
│ 0 │ 12.5% │ Little│ 10791 │ 3124 │ 10657 │ 3741 │ -1.2% -134ns │
├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
│ 1 │ 25% │ Mid │ 2924 │ 3097 │ 2894 │ 3740 │ -1% -30ns │
├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
│ 2 │ 37.5% │ Mid │ 2207 │ 3104 │ 2162 │ 3740 │ -2% -45ns │
├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
│ 3 │ 50% │ Mid │ 1897 │ 3119 │ 1864 │ 3717 │ -1.7% -33ns │
├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
│ │ │ Mid │ 1700 │ 396 │ 1609 │ 1232 │ -5.4% -91ns │
│ 4 │ 62.5% ├───────┼────────┼───────┼───────┼───────┼───────────────┤
│ │ │ Big │ 1187 │ 2729 │ 1129 │ 2518 │ -4.9% -58ns │
├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
│ 5 │ 75% │ Big │ 984 │ 3124 │ 900 │ 3693 │ -8.5% -84ns │
└────────┴─────────────┴───────┴────────┴───────┴───────┴───────┴───────────────┘

Notice:

* The CPU column describes which CPU ran the find_energy_efficient()
function.

* I modified my patch so that no inefficient OPPs are reported. This is to
have a fairer comparison between the original table walk and the lookup
table.

* I removed from the table results that didn't have enough count to be
statistically significant.

--
Vincent.