RE: [RFC PATCH v4 0/2] cpuidle: teo: Introduce util-awareness

From: Doug Smythies
Date: Wed Nov 23 2022 - 23:08:43 EST


On 2022.11.21 04:23 Kajetan Puchalski wrote:

> Hi Rafael,
>
> On Wed, Nov 02, 2022 at 03:28:06PM +0000, Kajetan Puchalski wrote:
>
> [...]
>
>> v3 -> v4:
>> - remove the chunk of code skipping metrics updates when the CPU was utilized
>> - include new test results and more benchmarks in the cover letter
>
> [...]
>
> It's been some time so I just wanted to bump this, what do you think
> about this v4? Doug has already tested it, resuls for his machine are
> attached to the v3 thread.

Hi All,

I continued to test this and included the proposed ladder idle governor in my continued testing.
(Which is why I added Rui as an addressee)
However, I ran out of time. Here is what I have:

Kernel: 6.1-rc3 and with patch sets
Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
CPU scaling driver: intel_cpufreq
HWP disabled.
Unless otherwsie stated, performance CPU scaling govenor.

Legend:
teo: the current teo idle governor
util-v4: the RFC utilization teo patch set version 4.
menu: the menu idle governor
ladder-old: the current ladder idle governor
ladder: the RFC ladder patchset.

Workflow: shell-intensive serialized workloads.
Variable: PIDs per second.
Note: Single threaded.
Master reference: forced CPU affinity to 1 CPU.
Performance Results:
http://smythies.com/~doug/linux/idle/teo-util/graphs/pids-perf.png
Schedutil Results:
http://smythies.com/~doug/linux/idle/teo-util/graphs/pids-su.png

Workflow: sleeping ebizzy 128 threads.
Variable: interval (uSecs).
Performance Results:
http://smythies.com/~doug/linux/idle/teo-util/graphs/ebizzy-128-perf.png
Performance power and idle data:
http://smythies.com/~doug/linux/idle/teo-util/ebizzy/perf/
Schedutil Results:
http://smythies.com/~doug/linux/idle/teo-util/graphs/pids-su.png
Schedutil power and idle data:
http://smythies.com/~doug/linux/idle/teo-util/ebizzy/su/

Workflow: 6 core ping-pong.
Variable: amount of work packet per token transfer
Forced CPU affinity, 16.67% load per core (6 CPUs idle, 6 busy).
Overview:
http://smythies.com/~doug/linux/idle/teo-util/graphs/6-core-ping-pong-sweep.png
short loop times detail:
http://smythies.com/~doug/linux/idle/teo-util/graphs/6-core-ping-pong-sweep-detail-a.png
Power and idle data:
http://smythies.com/~doug/linux/idle/teo-util/ping-sweep/6-4/
The transition between 35 and 40 minutes will be some future investigation.

Workflow: periodic 73, 113, 211, 347, 401 work/sleep frequency.
Summary: Nothing interesting.
Variable: work packet (load), ramps up and then down.
Single threaded.
Power and idle data:
http://smythies.com/~doug/linux/idle/teo-util/consume/idle-3/
Higher resolution power data:
http://smythies.com/~doug/linux/idle/teo-util/consume/ps73/
http://smythies.com/~doug/linux/idle/teo-util/consume/ps113/
http://smythies.com/~doug/linux/idle/teo-util/consume/ps211/
http://smythies.com/~doug/linux/idle/teo-util/consume/ps347/
http://smythies.com/~doug/linux/idle/teo-util/consume/ps401/

Workflow: fast speed 2 pair, 4 threads ping-pong.
Variable: none, this is a dwell test.
Results:
http://smythies.com/~doug/linux/idle/teo-util/many-0-400000000-2/times.txt
Performance power and idle data:
http://smythies.com/~doug/linux/idle/teo-util/many-0-400000000-2/perf/
Schedutil power and idle data:
http://smythies.com/~doug/linux/idle/teo-util/many-0-400000000-2/su/

Workflow: medium speed 2 pair, 4 threads ping-pong.
Variable: none, this is a dwell test.
Results:
http://smythies.com/~doug/linux/idle/teo-util/many-3000-100000000-2/times.txt
Performance power and idle data:
http://smythies.com/~doug/linux/idle/teo-util/many-3000-100000000-2/perf/
Schedutil power and idle data:
http://smythies.com/~doug/linux/idle/teo-util/many-3000-100000000-2/su/

Workflow: slow speed 2 pair, 4 threads ping-pong.
Variable: none, this is a dwell test.
Results:
http://smythies.com/~doug/linux/idle/teo-util/many-1000000-342000-2/times.txt
Performance power and idle data:
http://smythies.com/~doug/linux/idle/teo-util/many-1000000-342000-2/perf/
Schedutil power and idle data:
http://smythies.com/~doug/linux/idle/teo-util/many-1000000-342000-2/su/

Results summary:

Results are uSeconds per loop.
Less is better.

Slow ping pong - 2 pairs, 4 threads.

Performance:
ladder_old: Average: 2583 (-0.56%)
ladder: Average: 2617 (+0.81%)
menu: Average: 2596 Reference Time.
teo: Average: 2689 (+3.6%)
util-v4 Average: 2665 (+2.7%)

Schedutil:
ladder-old: Average: 4490 (+44%)
ladder: Average: 3296 (+5.9%)
menu: Average: 3113 Reference Time.
teo: Average: 4005 (+29%)
util-v4: Average: 3527 (+13%)

Medium ping pong - 2 pairs, 4 threads.

Performance:
ladder-old: Average: 11.8214 (+4.6%)
ladder: Average: 11.7730 (+4.2%)
menu: Average: 11.2971 Reference Time.
teo: Average: 11.355 (+5.1%)
util-v4: Average: 11.3364 (+3.4%)

Schedutil:
ladder-old: Average: 15.6813 (+30%)
ladder: Average: 15.4338 (+28%)
menu: Average: 12.0868 Reference Time.
teo: Average: 11.7367 (-2.9%)
util-v4: Average: 11.6352 (-3.7%)

Fast ping pong - 2 pairs, 4 threads.

Performance:
ladder-old: Average: 4.009 (+39%)
ladder: Average: 3.844 (+33%)
menu: Average: 2.891 Reference Time.
teo: Average: 3.053 (+5.6%)
util-v4: Average: 2.985 (+3.2%)

Schedutil:
ladder-old: Average: 5.053 (+64%)
ladder: Average: 5.278 (+71%)
menu: Average: 3.078 Reference Time.
teo: Average: 3.106 (+0.91%)
util-v4: Average: 3.15 (+2.35%)

... Doug