Re: [RFC PATCH] sched: Consolidate cpufreq updates

From: Ingo Molnar
Date: Tue Mar 26 2024 - 04:20:46 EST



* Qais Yousef <qyousef@xxxxxxxxxxx> wrote:

> Results of `perf stat --repeat 10 perf bench sched pipe` on AMD 3900X to
> verify any potential overhead because of the addition at context switch
>
> Before:
> -------
>
> Performance counter stats for 'perf bench sched pipe' (10 runs):
>
> 16,839.74 msec task-clock:u # 1.158 CPUs utilized ( +- 0.52% )
> 0 context-switches:u # 0.000 /sec
> 0 cpu-migrations:u # 0.000 /sec
> 1,390 page-faults:u # 83.903 /sec ( +- 0.06% )
> 333,773,107 cycles:u # 0.020 GHz ( +- 0.70% ) (83.72%)
> 67,050,466 stalled-cycles-frontend:u # 19.94% frontend cycles idle ( +- 2.99% ) (83.23%)
> 37,763,775 stalled-cycles-backend:u # 11.23% backend cycles idle ( +- 2.18% ) (83.09%)
> 84,456,137 instructions:u # 0.25 insn per cycle
> # 0.83 stalled cycles per insn ( +- 0.02% ) (83.01%)
> 34,097,544 branches:u # 2.058 M/sec ( +- 0.02% ) (83.52%)
> 8,038,902 branch-misses:u # 23.59% of all branches ( +- 0.03% ) (83.44%)
>
> 14.5464 +- 0.0758 seconds time elapsed ( +- 0.52% )
>
> After:
> -------
>
> Performance counter stats for 'perf bench sched pipe' (10 runs):
>
> 16,219.58 msec task-clock:u # 1.130 CPUs utilized ( +- 0.80% )
> 0 context-switches:u # 0.000 /sec
> 0 cpu-migrations:u # 0.000 /sec
> 1,391 page-faults:u # 85.163 /sec ( +- 0.06% )
> 342,768,312 cycles:u # 0.021 GHz ( +- 0.63% ) (83.36%)
> 66,231,208 stalled-cycles-frontend:u # 18.91% frontend cycles idle ( +- 2.34% ) (83.95%)
> 39,055,410 stalled-cycles-backend:u # 11.15% backend cycles idle ( +- 1.80% ) (82.73%)
> 84,475,662 instructions:u # 0.24 insn per cycle
> # 0.82 stalled cycles per insn ( +- 0.02% ) (83.05%)
> 34,067,160 branches:u # 2.086 M/sec ( +- 0.02% ) (83.67%)
> 8,042,888 branch-misses:u # 23.60% of all branches ( +- 0.07% ) (83.25%)
>
> 14.358 +- 0.116 seconds time elapsed ( +- 0.81% )

Noise caused by too many counters & the vagaries of multi-CPU scheduling is
drowning out any results here.

I'd suggest somethig like this to measure same-CPU context-switching
overhead:

taskset 1 perf stat --repeat 10 -e cycles,instructions,task-clock perf bench sched pipe

.. and make sure the cpufreq governor is at 'performance' first:

for ((cpu=0; cpu < $(nproc); cpu++)); do echo performance > /sys/devices/system/cpu/cpu$cpu/cpufreq/scaling_governor; done

With that approach you should much, much lower noise levels even with just
3 runs:

Performance counter stats for 'perf bench sched pipe' (3 runs):

51,616,501,297 cycles # 3.188 GHz ( +- 0.05% )
37,523,641,203 instructions # 0.73 insn per cycle ( +- 0.08% )
16,191.01 msec task-clock # 0.999 CPUs utilized ( +- 0.04% )

16.20511 +- 0.00578 seconds time elapsed ( +- 0.04% )

Thanks,

Ingo