Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods

From: K Prateek Nayak
Date: Wed Jul 05 2023 - 03:05:26 EST


Hello Tejun,

On 6/9/2023 4:20 AM, Tejun Heo wrote:
> [..snip..]
>
> Can you please test the following branch? It should have
> both bugs fixed properly.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git affinity-scopes-v2
>
> If that doesn't crash, I'd love to hear how it affects the perf regressions
> reported over that past few months.

Sorry about the delay. I'll leave the detailed results of the testing below,
results are from a dual socket 3rd Generation EPYC system (2 x 64C/128T)

tl;dr

- Apart from tbench and netperf, the rest of the benchmarks show no
difference out of the box.

- SPECjbb2015 Multi-jVM sees small uplift to max-jOPS with certain
affinity scopes.

- tbench and netperf seem to be unhappy throughout. None of the affinity
scopes seem to bring back the performance. I'll dig more into this.

Following are the results from running standard benchmarks on a
dual socket Zen3 (2 x 64C/128T) machine configured in different
NPS modes.

NPS Modes are used to logically divide single socket into
multiple NUMA region.
Following is the NUMA configuration for each NPS mode on the system:

NPS1: Each socket is a NUMA node.
Total 2 NUMA nodes in the dual socket machine.

Node 0: 0-63, 128-191
Node 1: 64-127, 192-255

NPS2: Each socket is further logically divided into 2 NUMA regions.
Total 4 NUMA nodes exist over 2 socket.

Node 0: 0-31, 128-159
Node 1: 32-63, 160-191
Node 2: 64-95, 192-223
Node 3: 96-127, 223-255

NPS4: Each socket is logically divided into 4 NUMA regions.
Total 8 NUMA nodes exist over 2 socket.

Node 0: 0-15, 128-143
Node 1: 16-31, 144-159
Node 2: 32-47, 160-175
Node 3: 48-63, 176-191
Node 4: 64-79, 192-207
Node 5: 80-95, 208-223
Node 6: 96-111, 223-231
Node 7: 112-127, 232-255

Benchmark Results:

Kernel versions:
- base: affinity-scopes-v2 branch at
commit 18c8ae813156 ("workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0")

- affinity_scopes: affinity-scopes-v2 branch at
commit a4da9f618d3e ("workqueue: Add "Affinity Scopes and Performance" section to documentation")
running with the default affinity scope.

~~~~~~~~~~~~~
~ hackbench ~
~~~~~~~~~~~~~

o NPS1

Test: base affinity_scopes
1-groups: 0.00 (0.00 pct) 3.68 (0.00 pct)
2-groups: 4.41 (0.00 pct) 4.40 (0.22 pct)
4-groups: 4.91 (0.00 pct) 4.87 (0.81 pct)
8-groups: 5.64 (0.00 pct) 5.74 (-1.77 pct)
16-groups: 7.72 (0.00 pct) 7.54 (2.33 pct)

o NPS2

Test: base affinity_scopes
1-groups: 3.74 (0.00 pct) 3.85 (-2.94 pct)
2-groups: 4.38 (0.00 pct) 4.34 (0.91 pct)
4-groups: 4.87 (0.00 pct) 4.80 (1.43 pct)
8-groups: 5.42 (0.00 pct) 5.40 (0.36 pct)
16-groups: 6.75 (0.00 pct) 7.02 (-4.00 pct)

o NPS4

Test: base affinity_scopes
1-groups: 3.90 (0.00 pct) 3.84 (1.53 pct)
2-groups: 4.40 (0.00 pct) 4.39 (0.22 pct)
4-groups: 4.86 (0.00 pct) 4.85 (0.20 pct)
8-groups: 5.44 (0.00 pct) 5.44 (0.00 pct)
16-groups: 7.20 (0.00 pct) 7.08 (1.66 pct)

~~~~~~~~~~~~
~ schbench ~
~~~~~~~~~~~~

o NPS1

#workers: base affinity_scopes
1: 26.00 (0.00 pct) 26.00 (0.00 pct)
2: 26.00 (0.00 pct) 28.00 (-7.69 pct)
4: 31.00 (0.00 pct) 28.00 (9.67 pct)
8: 37.00 (0.00 pct) 37.00 (0.00 pct)
16: 49.00 (0.00 pct) 47.00 (4.08 pct)
32: 78.00 (0.00 pct) 81.00 (-3.84 pct)
64: 170.00 (0.00 pct) 173.00 (-1.76 pct)
128: 369.00 (0.00 pct) 344.00 (6.77 pct)
256: 49600.00 (0.00 pct) 48704.00 (1.80 pct)
512: 93568.00 (0.00 pct) 93824.00 (-0.27 pct)

o NPS2

#workers: base affinity_scopes
1: 24.00 (0.00 pct) 23.00 (4.16 pct)
2: 29.00 (0.00 pct) 25.00 (13.79 pct)
4: 31.00 (0.00 pct) 32.00 (-3.22 pct)
8: 43.00 (0.00 pct) 39.00 (9.30 pct)
16: 52.00 (0.00 pct) 52.00 (0.00 pct)
32: 82.00 (0.00 pct) 89.00 (-8.53 pct)
64: 179.00 (0.00 pct) 154.00 (13.96 pct)
128: 400.00 (0.00 pct) 360.00 (10.00 pct)
256: 49856.00 (0.00 pct) 48576.00 (2.56 pct)
512: 93056.00 (0.00 pct) 91520.00 (1.65 pct)

o NPS4

#workers: base affinity_scopes
1: 25.00 (0.00 pct) 22.00 (12.00 pct)
2: 26.00 (0.00 pct) 27.00 (-3.84 pct)
4: 29.00 (0.00 pct) 28.00 (3.44 pct)
8: 48.00 (0.00 pct) 44.00 (8.33 pct)
16: 55.00 (0.00 pct) 59.00 (-7.27 pct)
32: 88.00 (0.00 pct) 84.00 (4.54 pct)
64: 166.00 (0.00 pct) 173.00 (-4.21 pct)
128: 374.00 (0.00 pct) 368.00 (1.60 pct)
256: 49600.00 (0.00 pct) 49856.00 (-0.51 pct)
512: 93824.00 (0.00 pct) 93568.00 (0.27 pct)


~~~~~~~~~~
~ tbench ~
~~~~~~~~~~

o NPS1

Clients: base affinity_scopes
1 450.40 (0.00 pct) 456.71 (1.40 pct)
2 872.50 (0.00 pct) 882.38 (1.13 pct)
4 1630.13 (0.00 pct) 1605.48 (-1.51 pct)
8 3139.90 (0.00 pct) 3041.39 (-3.13 pct)
16 6113.51 (0.00 pct) 5449.58 (-10.86 pct)
32 11024.64 (0.00 pct) 9147.71 (-17.02 pct)
64 19081.96 (0.00 pct) 14843.46 (-22.21 pct)
128 30956.07 (0.00 pct) 27493.35 (-11.18 pct)
256 42829.46 (0.00 pct) 36913.54 (-13.81 pct)
512 42395.69 (0.00 pct) 36165.41 (-14.69 pct)
1024 41973.51 (0.00 pct) 38530.57 (-8.20 pct)

o NPS2

Clients: base affinity_scopes
1 451.37 (0.00 pct) 450.97 (-0.08 pct)
2 875.07 (0.00 pct) 874.08 (-0.11 pct)
4 1636.31 (0.00 pct) 1639.60 (0.20 pct)
8 3162.48 (0.00 pct) 3074.73 (-2.77 pct)
16 5794.93 (0.00 pct) 5502.22 (-5.05 pct)
32 11205.26 (0.00 pct) 8979.27 (-19.86 pct)
64 20770.79 (0.00 pct) 17151.10 (-17.42 pct)
128 30485.82 (0.00 pct) 26953.16 (-11.58 pct)
256 40161.93 (0.00 pct) 35892.11 (-10.63 pct)
512 44513.43 (0.00 pct) 38876.31 (-12.66 pct)
1024 42781.13 (0.00 pct) 38313.23 (-10.44 pct)

o NPS4

Clients: base affinity_scopes
1 451.25 (0.00 pct) 447.95 (-0.73 pct)
2 877.94 (0.00 pct) 877.93 (0.00 pct)
4 1641.74 (0.00 pct) 1653.17 (0.69 pct)
8 3140.87 (0.00 pct) 3050.94 (-2.86 pct)
16 5878.87 (0.00 pct) 5291.66 (-9.98 pct)
32 10910.11 (0.00 pct) 9745.45 (-10.67 pct)
64 18814.62 (0.00 pct) 16708.46 (-11.19 pct)
128 29238.49 (0.00 pct) 27598.00 (-5.61 pct)
256 42119.54 (0.00 pct) 38464.91 (-8.67 pct)
512 41645.81 (0.00 pct) 40330.03 (-3.15 pct)
1024 41977.06 (0.00 pct) 39540.55 (-5.80 pct)


~~~~~~~~~~
~ stream ~
~~~~~~~~~~

o NPS1

- 10 Runs:

Test: base affinity_scopes
Copy: 245676.59 (0.00 pct) 333646.71 (35.80 pct)
Scale: 206545.41 (0.00 pct) 205706.04 (-0.40 pct)
Add: 213506.82 (0.00 pct) 236739.07 (10.88 pct)
Triad: 217679.43 (0.00 pct) 249263.46 (14.50 pct)

- 100 Runs:

Test: base affinity_scopes
Copy: 318060.91 (0.00 pct) 326025.89 (2.50 pct)
Scale: 213943.40 (0.00 pct) 207647.37 (-2.94 pct)
Add: 237892.53 (0.00 pct) 232164.59 (-2.40 pct)
Triad: 245672.84 (0.00 pct) 246333.21 (0.26 pct)

o NPS2

- 10 Runs:

Test: base affinity_scopes
Copy: 296632.20 (0.00 pct) 291153.63 (-1.84 pct)
Scale: 206193.90 (0.00 pct) 216368.42 (4.93 pct)
Add: 240363.50 (0.00 pct) 245954.23 (2.32 pct)
Triad: 242748.60 (0.00 pct) 238606.20 (-1.70 pct)

- 100 Runs:

Test: base affinity_scopes
Copy: 322535.79 (0.00 pct) 315020.03 (-2.33 pct)
Scale: 217723.56 (0.00 pct) 220172.32 (1.12 pct)
Add: 248117.72 (0.00 pct) 250557.17 (0.98 pct)
Triad: 257768.66 (0.00 pct) 248264.00 (-3.68 pct)

o NPS4

- 10 Runs:

Test: base affinity_scopes
Copy: 274067.54 (0.00 pct) 302804.77 (10.48 pct)
Scale: 224944.53 (0.00 pct) 230112.39 (2.29 pct)
Add: 229318.09 (0.00 pct) 241939.54 (5.50 pct)
Triad: 230175.89 (0.00 pct) 253613.85 (10.18 pct)

- 100 Runs:

Test: base affinity_scopes
Copy: 338922.96 (0.00 pct) 348183.65 (2.73 pct)
Scale: 240262.45 (0.00 pct) 245939.67 (2.36 pct)
Add: 256968.24 (0.00 pct) 260657.01 (1.43 pct)
Triad: 262644.16 (0.00 pct) 262286.46 (-0.13 pct)

~~~~~~~~~~~
~ netperf ~
~~~~~~~~~~~

o NPS1

Test: base affinity_scopes
1-clients: 100910.82 (0.00 pct) 102553.83 (1.62 pct)
2-clients: 99777.76 (0.00 pct) 99390.14 (-0.38 pct)
4-clients: 97676.17 (0.00 pct) 95856.63 (-1.86 pct)
8-clients: 95413.11 (0.00 pct) 88801.05 (-6.92 pct)
16-clients: 88961.66 (0.00 pct) 78807.71 (-11.41 pct)
32-clients: 82199.83 (0.00 pct) 73372.46 (-10.73 pct)
64-clients: 66094.87 (0.00 pct) 58487.61 (-11.50 pct)
128-clients: 43833.63 (0.00 pct) 42005.47 (-4.17 pct)
256-clients: 38917.58 (0.00 pct) 22653.73 (-41.79 pct)

o NPS2

Test: base affinity_scopes
1-clients: 101745.99 (0.00 pct) 102703.66 (0.94 pct)
2-clients: 100013.62 (0.00 pct) 99536.20 (-0.47 pct)
4-clients: 97124.42 (0.00 pct) 95261.28 (-1.91 pct)
8-clients: 92110.60 (0.00 pct) 87714.72 (-4.77 pct)
16-clients: 84578.86 (0.00 pct) 77329.65 (-8.57 pct)
32-clients: 78272.91 (0.00 pct) 72114.77 (-7.86 pct)
64-clients: 61595.20 (0.00 pct) 58001.87 (-5.83 pct)
128-clients: 44119.18 (0.00 pct) 40057.91 (-9.20 pct)
256-clients: 36221.03 (0.00 pct) 21468.40 (-40.72 pct)

o NPS4

Test: base affinity_scopes
1-clients: 102711.93 (0.00 pct) 103244.49 (0.51 pct)
2-clients: 101655.11 (0.00 pct) 98764.88 (-2.84 pct)
4-clients: 98519.58 (0.00 pct) 94439.88 (-4.14 pct)
8-clients: 94247.56 (0.00 pct) 88618.17 (-5.97 pct)
16-clients: 87515.03 (0.00 pct) 82392.50 (-5.85 pct)
32-clients: 81486.07 (0.00 pct) 74022.13 (-9.15 pct)
64-clients: 68436.64 (0.00 pct) 60303.48 (-11.88 pct)
128-clients: 49393.57 (0.00 pct) 43924.74 (-11.07 pct)
256-clients: 41111.27 (0.00 pct) 27694.64 (-32.63 pct)

~~~~~~~~~~~~~
~ unixbench ~
~~~~~~~~~~~~~

o NPS1

base affinity_scopes
Hmean unixbench-dhry2reg-1 41194259.44 ( 0.00%) 41044431.89 ( -0.36%)
Hmean unixbench-dhry2reg-512 6252840065.42 ( 0.00%) 6244309194.01 ( -0.14%)
Amean unixbench-syscall-1 2534936.20 ( 0.00%) 2517701.13 * 0.68%*
Amean unixbench-syscall-512 8037812.87 ( 0.00%) 7379945.23 * 8.18%*
Hmean unixbench-pipe-1 2391449.08 ( 0.00%) 2392275.16 ( 0.03%)
Hmean unixbench-pipe-512 340010431.31 ( 0.00%) 339389300.96 ( -0.18%)
Hmean unixbench-spawn-1 4471.68 ( 0.00%) 4568.80 ( 2.17%)
Hmean unixbench-spawn-512 66246.39 ( 0.00%) 62380.27 * -5.84%*
Hmean unixbench-execl-1 3695.11 ( 0.00%) 3663.75 * -0.85%*
Hmean unixbench-execl-512 12526.29 ( 0.00%) 11833.41 ( -5.53%)

o NPS2

base affinity_scopes
Hmean unixbench-dhry2reg-1 40812348.19 ( 0.00%) 41044955.13 ( 0.57%)
Hmean unixbench-dhry2reg-512 6248963826.97 ( 0.00%) 6244114150.91 ( -0.08%)
Amean unixbench-syscall-1 2479433.67 ( 0.00%) 2498544.70 ( -0.77%)
Amean unixbench-syscall-512 8064530.47 ( 0.00%) 8064139.93 ( 0.00%)
Hmean unixbench-pipe-1 2393194.62 ( 0.00%) 2365328.39 ( -1.16%)
Hmean unixbench-pipe-512 339553534.72 ( 0.00%) 340930432.76 ( 0.41%)
Hmean unixbench-spawn-1 4777.52 ( 0.00%) 4975.71 ( 4.15%)
Hmean unixbench-spawn-512 67467.26 ( 0.00%) 63427.50 * -5.99%*
Hmean unixbench-execl-1 3640.89 ( 0.00%) 3636.52 ( -0.12%)
Hmean unixbench-execl-512 14182.44 ( 0.00%) 13584.16 ( -4.22%)

o NPS4

base affinity_scopes
Hmean unixbench-dhry2reg-1 41075499.61 ( 0.00%) 41222189.50 ( 0.36%)
Hmean unixbench-dhry2reg-512 6250307266.90 ( 0.00%) 6251044709.08 ( 0.01%)
Amean unixbench-syscall-1 2538714.30 ( 0.00%) 2521520.87 * 0.68%*
Amean unixbench-syscall-512 7514126.30 ( 0.00%) 7534175.47 ( -0.27%)
Hmean unixbench-pipe-1 2393641.60 ( 0.00%) 2379400.79 ( -0.59%)
Hmean unixbench-pipe-512 339424173.78 ( 0.00%) 341229694.29 * 0.53%*
Hmean unixbench-spawn-1 5421.34 ( 0.00%) 5556.23 ( 2.49%)
Hmean unixbench-spawn-512 64071.52 ( 0.00%) 65783.47 * 2.67%*
Hmean unixbench-execl-1 3629.56 ( 0.00%) 3670.13 * 1.12%*
Hmean unixbench-execl-512 13641.24 ( 0.00%) 13848.81 ( 1.52%)

~~~~~~~~~~~~~~~~
~ ycsb-mongodb ~
~~~~~~~~~~~~~~~~

o NPS1:

base: 298681.00 (var: 2.31%)
affinity_scopes 295106.33 (var: 2.22%) (-1.19%)

o NPS2:

base: 296570.00 (var: 1.01%)
affinity_scopes 298637.67 (var: 1.50%) (0.70%)

o NPS4:

base 297181.67 (var: 0.46%)
affinity_scopes 294253.33 (var: 0.80%) (-0.99%)

~~~~~~~~~~~~~~~~~~
~ DeathStarBench ~
~~~~~~~~~~~~~~~~~~

o NPS1:

- 1 CCD

base: 1.00 (var: 0.14%)
affinity_scopes: 1.01 (var: 0.51%) (+1.19%)

- 2 CCD

base: 1.00 (var: 0.74%)
affinity_scopes: 0.99 (var: 0.47%) (-1.02%)

- 4 CCD

base: 1.00 (var: 0.33%)
affinity_scopes: 0.99 (var: 0.47%) (-0.95%)

- 8 CCD

base: 1.00 (var: 0.62%)
affinity_scopes: 0.99 (var: 2.30%) (-1.42%)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ Benchmarks run with multiple affinity scope ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

o NPS1

- tbench

Clients: base cpu cache numa system
1 450.40 (0.00 pct) 459.44 (2.00 pct) 457.12 (1.49 pct) 456.36 (1.32 pct) 456.75 (1.40 pct)
2 872.50 (0.00 pct) 869.68 (-0.32 pct) 890.59 (2.07 pct) 878.87 (0.73 pct) 890.14 (2.02 pct)
4 1630.13 (0.00 pct) 1621.24 (-0.54 pct) 1634.74 (0.28 pct) 1628.62 (-0.09 pct) 1646.57 (1.00 pct)
8 3139.90 (0.00 pct) 3044.58 (-3.03 pct) 3099.49 (-1.28 pct) 3081.43 (-1.86 pct) 3151.16 (0.35 pct)
16 6113.51 (0.00 pct) 5555.17 (-9.13 pct) 5465.09 (-10.60 pct) 5661.31 (-7.39 pct) 5742.58 (-6.06 pct)
32 11024.64 (0.00 pct) 9574.62 (-13.15 pct) 9282.62 (-15.80 pct) 9542.00 (-13.44 pct) 9916.66 (-10.05 pct)
64 19081.96 (0.00 pct) 15656.53 (-17.95 pct) 15176.12 (-20.46 pct) 16527.77 (-13.38 pct) 15097.97 (-20.87 pct)
128 30956.07 (0.00 pct) 28277.80 (-8.65 pct) 27662.76 (-10.63 pct) 27817.94 (-10.13 pct) 28925.78 (-6.55 pct)
256 42829.46 (0.00 pct) 38646.48 (-9.76 pct) 38355.27 (-10.44 pct) 37073.24 (-13.43 pct) 34391.01 (-19.70 pct)
512 42395.69 (0.00 pct) 36931.34 (-12.88 pct) 39259.49 (-7.39 pct) 36571.62 (-13.73 pct) 36245.55 (-14.50 pct)
1024 41973.51 (0.00 pct) 38817.07 (-7.52 pct) 38733.15 (-7.72 pct) 38864.45 (-7.40 pct) 35728.70 (-14.87 pct)

- netperf

base cpu cache numa system
1-clients: 100910.82 (0.00 pct) 103440.72 (2.50 pct) 102592.36 (1.66 pct) 103199.49 (2.26 pct) 103561.90 (2.62 pct)
2-clients: 99777.76 (0.00 pct) 100414.00 (0.63 pct) 100305.89 (0.52 pct) 99890.90 (0.11 pct) 101512.46 (1.73 pct)
4-clients: 97676.17 (0.00 pct) 96624.28 (-1.07 pct) 95966.77 (-1.75 pct) 97105.22 (-0.58 pct) 97972.11 (0.30 pct)
8-clients: 95413.11 (0.00 pct) 89926.72 (-5.75 pct) 89977.14 (-5.69 pct) 91020.10 (-4.60 pct) 92458.94 (-3.09 pct)
16-clients: 88961.66 (0.00 pct) 81295.02 (-8.61 pct) 79144.83 (-11.03 pct) 80216.42 (-9.83 pct) 85439.68 (-3.95 pct)
32-clients: 82199.83 (0.00 pct) 77914.00 (-5.21 pct) 75055.66 (-8.69 pct) 76813.94 (-6.55 pct) 80768.87 (-1.74 pct)
64-clients: 66094.87 (0.00 pct) 64419.91 (-2.53 pct) 63718.37 (-3.59 pct) 60370.40 (-8.66 pct) 66179.58 (0.12 pct)
128-clients: 43833.63 (0.00 pct) 42936.08 (-2.04 pct) 44554.69 (1.64 pct) 42666.82 (-2.66 pct) 45543.69 (3.90 pct)
256-clients: 38917.58 (0.00 pct) 24807.28 (-36.25 pct) 20517.01 (-47.28 pct) 21651.40 (-44.36 pct) 23778.87 (-38.89 pct)

- SPECjbb2015 Mutli-JVM

max-jOPS critical-jOPS
base: 0.00% 0.00%
smt: -1.11% -1.84%
cpu: 2.86% -1.35%
cache: 2.86% -1.66%
numa: 1.43% -1.49%
system: 0.08% -0.41%


I'll go dig deeper into the tbench and netperf regressions. I'm not sure
why the regression is observed for all the affinity scopes. I'll look
into IBS profile and see if something obvious pops up. Meanwhile if there
is any specific data you would like me to collect or benchmark you would
like me to test, let me know.

--
Thanks and Regards,
Prateek