RE: [RFC PATCH v3 2/2] cpuidle: teo: Introduce util-awareness

From: Doug Smythies
Date: Sat Nov 26 2022 - 23:30:42 EST


On 2022.11.25 10:27 Rafael wrote:
> On Mon, Oct 31, 2022 at 1:14 PM Kajetan wrote:

... [delete some] ...

>> /*
>> * Find the deepest idle state whose target residency does not exceed
>> * the current sleep length and the deepest idle state not deeper than
>> @@ -454,6 +527,11 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>> if (idx > constraint_idx)
>> idx = constraint_idx;
>>
>> + /* if the CPU is being utilized and C1 is the selected candidate */
>> + /* choose a shallower non-polling state to improve latency */
>
> Again, the kernel coding style for multi-line comments is different
> from the above.
>
>> + if (cpu_data->utilized && idx == 1)
>
> I've changed my mind with respect to adding the idx == 1 check to
> this. If the goal is to reduce latency for the "loaded" CPUs, this
> applies to deeper idle states too.

After taking idle state 0 (POLL) out of it, the energy cost for reducing
the selected idle state by 1 was still high in some cases, at least on my
Intel processor. That was mainly for idle state 2 being bumped to idle
state 1. I don't recall significant differences bumping idle state 3 to idle
state 2, but I don't know about other Intel processors.

So, there is a trade-off here where we might want to accept this higher
energy consumption for no gain in some workflows verses the higher
energy for gain in other workflows, or not.

Example 1: Higher energy, for no benefit:
Workflow: a medium load at 211 work/sleep frequency.
This data is for one thread, but I looked at up to 6 threads.
No performance metric, the work just has to finish before
the next cycle begins.
CPU frequency scaling driver: intel_pstate
CPU frequency scaling governor: powersave
No HWP.
Kernel 6.1-rc3

teo: ~14.8 watts
util-v4 without the "idx == 1" above: 16.1 watts (+8.8%)
More info:
http://smythies.com/~doug/linux/idle/teo-util/consume/dwell-v4/

Example 2: Lower energy, but no loss in performance:
Workflow: 500 threads, light load per thread,
approximately 10 hertz work/sleep frequency per thread.
CPU frequency scaling driver: intel_cpufreq
CPU frequency scaling governor: schedutil
No HWP.
Kernel 6.1-rc3

teo: ~70 watts
util-v4 without the "idx == 1" above: ~59 watts (-16%)
Execution times were the same
More info:
http://smythies.com/~doug/linux/idle/teo-util/waiter/

Note: legend util-v4-1 is util-v4 without the "idx == 1".
I have also added util-v4-1 to some of the previous results.

For reference, my testing processor:

Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz

$ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI
/sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2_ACPI
/sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3_ACPI

$ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/desc
/sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE
/sys/devices/system/cpu/cpu0/cpuidle/state1/desc:ACPI FFH MWAIT 0x0
/sys/devices/system/cpu/cpu0/cpuidle/state2/desc:ACPI FFH MWAIT 0x30
/sys/devices/system/cpu/cpu0/cpuidle/state3/desc:ACPI FFH MWAIT 0x60

... Doug