Re: [tip: sched/core] sched/fair: Multi-LLC select_idle_sibling()

From: Peter Zijlstra
Date: Tue Jun 13 2023 - 04:25:52 EST


On Thu, Jun 08, 2023 at 12:02:15AM +0530, K Prateek Nayak wrote:
> Hello Peter,
>
> Below are the benchmark results on different NPS modes for SIS_NODE
> and SIS_NODE + additional suggested changes. None of them give a
> total win. Limit helps but there are cases where it still leads to
> regression. I'll leave full details below.
>
> On 6/2/2023 12:24 PM, Peter Zijlstra wrote:
> > On Fri, Jun 02, 2023 at 10:43:37AM +0530, K Prateek Nayak wrote:
> >> Grouping near-CCX for the offerings that do not have 2CCX per CCD will
> >> prevent degenration and limit the search scope yes. Here is what I'll
> >> do, let me check if limiting search scope helps first, and then start
> >> fiddling with the topology. How does that sound?
> >
> > So my preference would be the topology based solution, since the search
> > limit is random magic numbers that happen to work for 'your' machine but
> > who knows what it'll do for some other poor architecture that happens to
> > trip this.
> >
> > That said; verifying the limit helps at all is of course a good start,
> > because if it doesn't then the topology thing will likely also not help
> > much.
>
> o NPS Modes
>
> NPS Modes are used to logically divide single socket into
> multiple NUMA region.
> Following is the NUMA configuration for each NPS mode on the system:
>
> NPS1: Each socket is a NUMA node.
> Total 2 NUMA nodes in the dual socket machine.
>
> Node 0: 0-63, 128-191
> Node 1: 64-127, 192-255
>
> - 8CCX per node

Ok, so this is a dual-socket Zen3 with 64 cores per socket, right?


> o Kernel Versions
>
> - tip - tip:sched/core at commit e2a1f85bf9f5 "sched/psi:
> Avoid resetting the min update period when it is
> unnecessary")
>
> - SIS_NODE - tip:sched/core + this patch
>
> - SIS_NODE_LIMIT - tip:sched/core + this patch + nr=4 limit for SIS_NODE
> (https://lore.kernel.org/all/20230601111326.GV4253@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/)
>
> - SIS_NODE_TOPOEXT - tip:sched/core + this patch
> + new sched domain (Multi-Multi-Core or MMC)
> (https://lore.kernel.org/all/20230601153522.GB559993@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/)
> MMC domain groups 2 nearby CCX.

OK, so you managed to get the NPS4 topology in NPS1 mode?

> o Benchmark Results
>
> Note: All benchmarks were run with boost enabled and C2 disabled.
>
> ~~~~~~~~~~~~~
> ~ hackbench ~
> ~~~~~~~~~~~~~
>
> o NPS1
>
> Test: tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> 1-groups: 3.92 (0.00 pct) 4.05 (-3.31 pct) 3.78 (3.57 pct) 3.77 (3.82 pct)
> 2-groups: 4.58 (0.00 pct) 3.84 (16.15 pct) 4.50 (1.74 pct) 4.34 (5.24 pct)
> 4-groups: 4.99 (0.00 pct) 3.98 (20.24 pct) 4.93 (1.20 pct) 5.01 (-0.40 pct)
> 8-groups: 5.67 (0.00 pct) 6.05 (-6.70 pct) 5.73 (-1.05 pct) 5.95 (-4.93 pct)
> 16-groups: 7.88 (0.00 pct) 10.56 (-34.01 pct) 7.83 (0.63 pct) 8.04 (-2.03 pct)
>
> o NPS2
>
> Test: tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> 1-groups: 3.82 (0.00 pct) 3.68 (3.66 pct) 3.87 (-1.30 pct) 3.74 (2.09 pct)
> 2-groups: 4.40 (0.00 pct) 3.61 (17.95 pct) 4.45 (-1.13 pct) 4.30 (2.27 pct)
> 4-groups: 4.84 (0.00 pct) 3.62 (25.20 pct) 4.84 (0.00 pct) 4.97 (-2.68 pct)
> 8-groups: 5.45 (0.00 pct) 6.14 (-12.66 pct) 5.40 (0.91 pct) 5.68 (-4.22 pct)
> 16-groups: 6.94 (0.00 pct) 8.77 (-26.36 pct) 6.57 (5.33 pct) 7.87 (-13.40 pct)
>
> o NPS4
>
> Test: tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> 1-groups: 3.82 (0.00 pct) 3.84 (-0.52 pct) 3.83 (-0.26 pct) 3.85 (-0.78 pct)
> 2-groups: 4.44 (0.00 pct) 4.15 (6.53 pct) 4.43 (0.22 pct) 4.18 (5.85 pct)
> 4-groups: 4.86 (0.00 pct) 4.95 (-1.85 pct) 4.88 (-0.41 pct) 4.79 (1.44 pct)
> 8-groups: 5.42 (0.00 pct) 5.80 (-7.01 pct) 5.41 (0.18 pct) 5.75 (-6.08 pct)
> 16-groups: 6.68 (0.00 pct) 9.07 (-35.77 pct) 6.72 (-0.59 pct) 8.66 (-29.64 pct)

Win for NODE_LIMIT for having the least regressions, but also no real
gains.

Given NODE_TOPO does NPS4 that should be roughtly similar to limit=2 it
should do 'better' but it doesn't, it's markedly worse... weird.

In fact, none of the NPS4 numbers make any sense, if you've already
split the whole thing into 4, you remain with 2 CCXs per node and
NODE should be NODE_LIMIT should be NODE_TOPO.

All the NODE variants should end up scanning both CCXs and performance
should really be the same.

Something's wrong there.


> ~~~~~~~~~~
> ~ tbench ~
> ~~~~~~~~~~
>
> o NPS1
>
> Clients: tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> 1 452.49 (0.00 pct) 457.94 (1.20 pct) 458.13 (1.24 pct) 447.69 (-1.06 pct)
> 2 862.44 (0.00 pct) 879.99 (2.03 pct) 881.19 (2.17 pct) 855.91 (-0.75 pct)
> 4 1604.27 (0.00 pct) 1618.87 (0.91 pct) 1628.00 (1.47 pct) 1627.14 (1.42 pct)
> 8 2966.77 (0.00 pct) 3040.90 (2.49 pct) 3037.70 (2.39 pct) 2957.91 (-0.29 pct)
> 16 5176.70 (0.00 pct) 5292.29 (2.23 pct) 5445.15 (5.18 pct) 5241.61 (1.25 pct)
> 32 8205.24 (0.00 pct) 8949.12 (9.06 pct) 8716.02 (6.22 pct) 8494.17 (3.52 pct)
> 64 13956.71 (0.00 pct) 14461.42 (3.61 pct) 13620.04 (-2.41 pct) 15045.43 (7.80 pct)
> 128 24005.50 (0.00 pct) 26052.75 (8.52 pct) 24975.03 (4.03 pct) 24008.73 (0.01 pct)
> 256 32457.61 (0.00 pct) 21999.41 (-32.22 pct) 30810.93 (-5.07 pct) 31060.12 (-4.30 pct)
> 512 34345.24 (0.00 pct) 41166.39 (19.86 pct) 30982.94 (-9.78 pct) 31864.14 (-7.22 pct)
> 1024 33432.92 (0.00 pct) 40900.84 (22.33 pct) 30953.61 (-7.41 pct) 32006.81 (-4.26 pct)
>
> o NPS2
>
> Clients: tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> 1 453.73 (0.00 pct) 451.63 (-0.46 pct) 455.97 (0.49 pct) 453.79 (0.01 pct)
> 2 861.71 (0.00 pct) 857.85 (-0.44 pct) 868.30 (0.76 pct) 850.14 (-1.34 pct)
> 4 1599.14 (0.00 pct) 1609.30 (0.63 pct) 1656.08 (3.56 pct) 1619.10 (1.24 pct)
> 8 2951.03 (0.00 pct) 2944.71 (-0.21 pct) 3034.38 (2.82 pct) 2973.52 (0.76 pct)
> 16 5080.32 (0.00 pct) 5160.39 (1.57 pct) 5173.32 (1.83 pct) 5150.99 (1.39 pct)
> 32 7900.41 (0.00 pct) 8039.13 (1.75 pct) 8105.69 (2.59 pct) 7956.45 (0.70 pct)
> 64 14629.65 (0.00 pct) 15391.08 (5.20 pct) 14546.09 (-0.57 pct) 15410.41 (5.33 pct)
> 128 23155.88 (0.00 pct) 24015.45 (3.71 pct) 24263.82 (4.78 pct) 23351.35 (0.84 pct)
> 256 33449.57 (0.00 pct) 33571.08 (0.36 pct) 32048.20 (-4.18 pct) 32869.85 (-1.73 pct)
> 512 33757.47 (0.00 pct) 39872.69 (18.11 pct) 32945.66 (-2.40 pct) 34526.17 (2.27 pct)
> 1024 34823.14 (0.00 pct) 41090.15 (17.99 pct) 32404.40 (-6.94 pct) 34522.97 (-0.86 pct)
>
> o NPS4
>
> Clients: tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> 1 450.14 (0.00 pct) 454.46 (0.95 pct) 454.53 (0.97 pct) 451.43 (0.28 pct)
> 2 863.26 (0.00 pct) 868.94 (0.65 pct) 891.89 (3.31 pct) 866.74 (0.40 pct)
> 4 1618.71 (0.00 pct) 1599.13 (-1.20 pct) 1630.29 (0.71 pct) 1610.08 (-0.53 pct)
> 8 2929.35 (0.00 pct) 3065.12 (4.63 pct) 3064.15 (4.60 pct) 3004.74 (2.57 pct)
> 16 5114.04 (0.00 pct) 5261.40 (2.88 pct) 5238.04 (2.42 pct) 5108.53 (-0.10 pct)
> 32 7912.18 (0.00 pct) 8926.77 (12.82 pct) 8382.51 (5.94 pct) 8214.73 (3.82 pct)
> 64 14424.72 (0.00 pct) 14853.61 (2.97 pct) 14273.54 (-1.04 pct) 14430.17 (0.03 pct)
> 128 23614.97 (0.00 pct) 24506.73 (3.77 pct) 24517.76 (3.82 pct) 23296.38 (-1.34 pct)
> 256 34365.13 (0.00 pct) 35538.42 (3.41 pct) 31909.66 (-7.14 pct) 31009.12 (-9.76 pct)
> 512 34215.50 (0.00 pct) 36017.49 (5.26 pct) 32696.70 (-4.43 pct) 33262.55 (-2.78 pct)
> 1024 35421.90 (0.00 pct) 35193.81 (-0.64 pct) 32611.10 (-7.93 pct) 32795.86 (-7.41 pct)

tbench likes NODE

> ~~~~~~~~~~
> ~ stream ~
> ~~~~~~~~~~
>
> - 10 Runs
>
> o NPS1
>
> Test: tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> Copy: 271317.35 (0.00 pct) 292440.22 (7.78 pct) 302540.26 (11.50 pct) 287277.25 (5.88 pct)
> Scale: 205533.77 (0.00 pct) 203362.60 (-1.05 pct) 207750.30 (1.07 pct) 205206.26 (-0.15 pct)
> Add: 221624.62 (0.00 pct) 225850.83 (1.90 pct) 233782.14 (5.48 pct) 229774.48 (3.67 pct)
> Triad: 228500.68 (0.00 pct) 225885.25 (-1.14 pct) 238331.69 (4.30 pct) 240041.53 (5.05 pct)
>
> o NPS2
>
> Test: tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> Copy: 277761.29 (0.00 pct) 301816.34 (8.66 pct) 293563.58 (5.68 pct) 308218.80 (10.96 pct)
> Scale: 215193.83 (0.00 pct) 212522.72 (-1.24 pct) 215758.66 (0.26 pct) 205678.94 (-4.42 pct)
> Add: 242725.75 (0.00 pct) 242695.13 (-0.01 pct) 246472.20 (1.54 pct) 238089.46 (-1.91 pct)
> Triad: 237253.44 (0.00 pct) 250618.57 (5.63 pct) 239405.55 (0.90 pct) 249652.73 (5.22 pct)
>
> o NPS4
>
> Test: tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> Copy: 273307.14 (0.00 pct) 255091.78 (-6.66 pct) 301926.68 (10.47 pct) 262007.26 (-4.13 pct)
> Scale: 235715.23 (0.00 pct) 222018.36 (-5.81 pct) 224881.52 (-4.59 pct) 222282.64 (-5.69 pct)
> Add: 244500.40 (0.00 pct) 230468.21 (-5.73 pct) 242625.18 (-0.76 pct) 227146.80 (-7.09 pct)
> Triad: 250600.04 (0.00 pct) 236229.50 (-5.73 pct) 258064.49 (2.97 pct) 231772.02 (-7.51 pct)
>
> - 100 Runs
>
> Test: tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> Copy: 317381.65 (0.00 pct) 318827.08 (0.45 pct) 320898.32 (1.10 pct) 318922.96 (0.48 pct)
> Scale: 214145.00 (0.00 pct) 206213.69 (-3.70 pct) 211019.12 (-1.45 pct) 210384.47 (-1.75 pct)
> Add: 239243.29 (0.00 pct) 229791.67 (-3.95 pct) 233827.11 (-2.26 pct) 236659.48 (-1.07 pct)
> Triad: 249477.76 (0.00 pct) 236843.06 (-5.06 pct) 244688.91 (-1.91 pct) 235990.67 (-5.40 pct)
>
> o NPS2
>
> Test: tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> Copy: 318082.10 (0.00 pct) 322844.91 (1.49 pct) 310350.21 (-2.43 pct) 322495.84 (1.38 pct)
> Scale: 219338.56 (0.00 pct) 218139.90 (-0.54 pct) 212288.47 (-3.21 pct) 221040.27 (0.77 pct)
> Add: 248118.20 (0.00 pct) 249826.98 (0.68 pct) 239682.55 (-3.39 pct) 253006.79 (1.97 pct)
> Triad: 247088.55 (0.00 pct) 260488.38 (5.42 pct) 247892.42 (0.32 pct) 249081.33 (0.80 pct)
>
> o NPS4
>
> Test: tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> Copy: 345396.19 (0.00 pct) 343675.74 (-0.49 pct) 346990.96 (0.46 pct) 334677.55 (-3.10 pct)
> Scale: 241521.63 (0.00 pct) 231494.70 (-4.15 pct) 236233.18 (-2.18 pct) 229159.01 (-5.11 pct)
> Add: 261157.86 (0.00 pct) 249663.86 (-4.40 pct) 253402.85 (-2.96 pct) 242257.98 (-7.23 pct)
> Triad: 267804.99 (0.00 pct) 263071.00 (-1.76 pct) 264208.15 (-1.34 pct) 256978.50 (-4.04 pct)

Again, the NPS4 reults are weird.

> ~~~~~~~~~~~
> ~ netperf ~
> ~~~~~~~~~~~
>
> o NPS1
>
> tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> 1-clients: 102839.97 (0.00 pct) 103540.33 (0.68 pct) 103769.74 (0.90 pct) 103271.77 (0.41 pct)
> 2-clients: 98428.08 (0.00 pct) 100431.67 (2.03 pct) 100555.62 (2.16 pct) 100417.11 (2.02 pct)
> 4-clients: 92298.45 (0.00 pct) 94800.51 (2.71 pct) 93706.09 (1.52 pct) 94981.10 (2.90 pct)
> 8-clients: 85618.41 (0.00 pct) 89130.14 (4.10 pct) 87677.84 (2.40 pct) 88284.61 (3.11 pct)
> 16-clients: 78722.18 (0.00 pct) 79715.38 (1.26 pct) 80488.76 (2.24 pct) 78980.88 (0.32 pct)
> 32-clients: 73610.75 (0.00 pct) 72801.41 (-1.09 pct) 72167.43 (-1.96 pct) 75077.55 (1.99 pct)
> 64-clients: 55285.07 (0.00 pct) 56184.38 (1.62 pct) 56443.79 (2.09 pct) 60689.05 (9.77 pct)
> 128-clients: 31176.92 (0.00 pct) 32830.06 (5.30 pct) 35511.93 (13.90 pct) 35638.50 (14.31 pct)
> 256-clients: 20011.44 (0.00 pct) 15135.39 (-24.36 pct) 17599.21 (-12.05 pct) 18219.29 (-8.95 pct)
>
> o NPS2
>
> tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> 1-clients: 103105.55 (0.00 pct) 101582.75 (-1.47 pct) 103077.22 (-0.02 pct) 102233.63 (-0.84 pct)
> 2-clients: 98720.29 (0.00 pct) 98537.46 (-0.18 pct) 100761.54 (2.06 pct) 99211.39 (0.49 pct)
> 4-clients: 92289.39 (0.00 pct) 94332.45 (2.21 pct) 93622.46 (1.44 pct) 93321.77 (1.11 pct)
> 8-clients: 84998.63 (0.00 pct) 87180.90 (2.56 pct) 86970.84 (2.32 pct) 86076.75 (1.26 pct)
> 16-clients: 76395.81 (0.00 pct) 80017.06 (4.74 pct) 77937.29 (2.01 pct) 75090.85 (-1.70 pct)
> 32-clients: 71110.89 (0.00 pct) 69445.86 (-2.34 pct) 69273.81 (-2.58 pct) 66885.99 (-5.94 pct)
> 64-clients: 49526.21 (0.00 pct) 50004.13 (0.96 pct) 51649.09 (4.28 pct) 51100.52 (3.17 pct)
> 128-clients: 27917.51 (0.00 pct) 30581.70 (9.54 pct) 31587.40 (13.14 pct) 33477.65 (19.91 pct)
> 256-clients: 20067.17 (0.00 pct) 26002.42 (29.57 pct) 18681.28 (-6.90 pct) 18144.96 (-9.57 pct)
>
> o NPS4
>
> tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> 1-clients: 102139.49 (0.00 pct) 103578.02 (1.40 pct) 103633.90 (1.46 pct) 101656.07 (-0.47 pct)
> 2-clients: 98259.53 (0.00 pct) 99336.70 (1.09 pct) 99720.37 (1.48 pct) 98812.86 (0.56 pct)
> 4-clients: 91576.79 (0.00 pct) 95278.30 (4.04 pct) 93688.37 (2.30 pct) 93848.94 (2.48 pct)
> 8-clients: 84742.30 (0.00 pct) 89005.65 (5.03 pct) 87703.04 (3.49 pct) 86709.29 (2.32 pct)
> 16-clients: 79540.75 (0.00 pct) 85478.97 (7.46 pct) 83195.92 (4.59 pct) 81016.24 (1.85 pct)
> 32-clients: 71166.14 (0.00 pct) 74254.01 (4.33 pct) 72422.76 (1.76 pct) 71391.62 (0.31 pct)
> 64-clients: 51763.24 (0.00 pct) 52565.56 (1.54 pct) 55159.65 (6.56 pct) 52472.91 (1.37 pct)
> 128-clients: 27829.29 (0.00 pct) 35774.61 (28.55 pct) 33738.97 (21.23 pct) 34564.10 (24.20 pct)
> 256-clients: 24185.37 (0.00 pct) 27215.35 (12.52 pct) 17675.87 (-26.91 pct) 24937.66 (3.11 pct)

NPS4 is weird again, but mostly wins.

Based on the NPS1 results I'd say this one goes to TOPO

> ~~~~~~~~~~~~~~~~
> ~ ycsb-mongodb ~
> ~~~~~~~~~~~~~~~~
>
> o NPS1
>
> tip: 131070.33 (var: 2.84%)
> SIS_NODE: 131070.33 (var: 2.84%) (0.00%)
> SIS_NODE_LIMIT: 137227.00 (var: 4.97%) (4.69%)
> SIS_NODE_TOPOEXT: 133529.67 (var: 0.98%) (1.87%)
>
> o NPS2
>
> tip: 133693.67 (var: 1.69%)
> SIS_NODE: 134173.00 (var: 4.07%) (0.35%)
> SIS_NODE_LIMIT: 134124.67 (var: 2.20%) (0.32%)
> SIS_NODE_TOPOEXT: 133747.33 (var: 2.49%) (0.04%)
>
> o NPS4
>
> tip: 132913.67 (var: 1.97%)
> SIS_NODE: 133697.33 (var: 1.69%) (0.58%)
> SIS_NODE_LIMIT: 133307.33 (var: 1.03%) (0.29%)
> SIS_NODE_TOPOEXT: 133426.67 (var: 3.60%) (0.38%)
>
> ~~~~~~~~~~~~~
> ~ unixbench ~
> ~~~~~~~~~~~~~
>
> o NPS1
>
> kernel tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> Hmean unixbench-dhry2reg-1 41322625.19 ( 0.00%) 41224388.33 ( -0.24%) 41142898.66 ( -0.43%) 41222168.97 ( -0.24%)
> Hmean unixbench-dhry2reg-512 6252491108.60 ( 0.00%) 6240160851.68 ( -0.20%) 6262714194.10 ( 0.16%) 6259553403.67 ( 0.11%)
> Amean unixbench-syscall-1 2501398.27 ( 0.00%) 2577323.43 * -3.04%* 2498697.20 ( 0.11%) 2541279.77 * -1.59%*
> Amean unixbench-syscall-512 8120524.00 ( 0.00%) 7512955.87 * 7.48%* 7447849.67 * 8.28%* 7477129.17 * 7.92%*
> Hmean unixbench-pipe-1 2359346.02 ( 0.00%) 2392308.62 * 1.40%* 2407625.04 * 2.05%* 2334146.94 * -1.07%*
> Hmean unixbench-pipe-512 338790322.61 ( 0.00%) 337711432.92 ( -0.32%) 340399941.24 ( 0.48%) 339008490.26 ( 0.06%)
> Hmean unixbench-spawn-1 4261.52 ( 0.00%) 4164.90 ( -2.27%) 4929.26 * 15.67%* 5111.16 * 19.94%*
> Hmean unixbench-spawn-512 64328.93 ( 0.00%) 62257.64 * -3.22%* 63740.04 * -0.92%* 63291.18 * -1.61%*
> Hmean unixbench-execl-1 3677.73 ( 0.00%) 3652.08 ( -0.70%) 3642.56 * -0.96%* 3671.98 ( -0.16%)
> Hmean unixbench-execl-512 11984.83 ( 0.00%) 13585.65 * 13.36%* 12496.80 ( 4.27%) 12306.01 ( 2.68%)
>
> o NPS2
>
> kernel tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> Hmean unixbench-dhry2reg-1 41311787.29 ( 0.00%) 41412946.27 ( 0.24%) 41035150.98 ( -0.67%) 41371003.93 ( 0.14%)
> Hmean unixbench-dhry2reg-512 6243873272.76 ( 0.00%) 6256893083.32 ( 0.21%) 6236751880.89 ( -0.11%) 6235047089.83 ( -0.14%)
> Amean unixbench-syscall-1 2503190.70 ( 0.00%) 2576854.30 * -2.94%* 2496464.80 * 0.27%* 2540298.77 * -1.48%*
> Amean unixbench-syscall-512 8012388.13 ( 0.00%) 7503196.87 * 6.36%* 7493284.60 * 6.48%* 7495117.73 * 6.46%*
> Hmean unixbench-pipe-1 2340486.25 ( 0.00%) 2388946.63 ( 2.07%) 2412344.33 * 3.07%* 2360277.30 ( 0.85%)
> Hmean unixbench-pipe-512 338965319.79 ( 0.00%) 337225630.07 ( -0.51%) 339053027.04 ( 0.03%) 336939353.18 * -0.60%*
> Hmean unixbench-spawn-1 5241.83 ( 0.00%) 5246.00 ( 0.08%) 4718.45 * -9.98%* 4967.96 * -5.22%*
> Hmean unixbench-spawn-512 65799.86 ( 0.00%) 64817.15 * -1.49%* 66418.37 ( 0.94%) 66820.63 * 1.55%*
> Hmean unixbench-execl-1 3670.65 ( 0.00%) 3622.36 * -1.32%* 3661.04 ( -0.26%) 3660.08 ( -0.29%)
> Hmean unixbench-execl-512 13682.00 ( 0.00%) 13699.90 ( 0.13%) 14103.91 ( 3.08%) 12960.11 ( -5.28%)
>
> o NPS4
>
> kernel tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> Hmean unixbench-dhry2reg-1 41025577.99 ( 0.00%) 40879469.78 ( -0.36%) 41082700.61 ( 0.14%) 41260407.54 ( 0.57%)
> Hmean unixbench-dhry2reg-512 6255568261.91 ( 0.00%) 6258326086.80 ( 0.04%) 6252223940.32 ( -0.05%) 6259088809.43 ( 0.06%)
> Amean unixbench-syscall-1 2507165.37 ( 0.00%) 2579108.77 * -2.87%* 2488617.40 * 0.74%* 2517574.40 ( -0.42%)
> Amean unixbench-syscall-512 7458476.50 ( 0.00%) 7502528.67 * -0.59%* 7978379.53 * -6.97%* 7580369.27 * -1.63%*
> Hmean unixbench-pipe-1 2369301.21 ( 0.00%) 2392905.29 * 1.00%* 2410432.93 * 1.74%* 2347814.20 ( -0.91%)
> Hmean unixbench-pipe-512 340299405.72 ( 0.00%) 339139980.01 * -0.34%* 340403992.95 ( 0.03%) 338708678.82 * -0.47%*
> Hmean unixbench-spawn-1 5571.78 ( 0.00%) 5423.03 ( -2.67%) 5462.82 ( -1.96%) 5543.08 ( -0.52%)
> Hmean unixbench-spawn-512 63999.96 ( 0.00%) 63485.41 ( -0.80%) 64730.98 * 1.14%* 67486.34 * 5.45%*
> Hmean unixbench-execl-1 3587.15 ( 0.00%) 3624.44 * 1.04%* 3638.74 * 1.44%* 3639.57 * 1.46%*
> Hmean unixbench-execl-512 14184.17 ( 0.00%) 13784.17 ( -2.82%) 13104.71 * -7.61%* 13598.22 ( -4.13%)
>
> ~~~~~~~~~~~~~~~~~~
> ~ DeathStarBench ~
> ~~~~~~~~~~~~~~~~~~
>
> o NPS1
> CCD Scaling tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> 1 1 0% 0.30% 0.83% 0.79%
> 1 1 0% 0.17% 2.53% 0.91%
> 1 1 0% -0.40% 2.90% 1.61%
> 1 1 0% -7.95% 1.19% -1.56%
>
> o NPS2
>
> CCD Scaling tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> 1 1 0% 0.34% -0.73% -0.62%
> 1 1 0% -0.02% 0.14% -1.15%
> 1 1 0% -12.34% -9.64% -7.80%
> 1 1 0% -12.41% -1.03% -9.85%
>
> Note: In NPS2, 8 CCD case shows 10% run to run variation.
>
> o NPS4
>
> CCD Scaling tip SIS_NODE SIS_NODE_LIMIT SIS_NODE_TOPOEXT
> 1 1 0% -1.32% -0.71% -1.09%
> 1 1 0% -1.53% -1.11% -1.73%
> 1 1 0% 7.19% -3.47% 5.75%
> 1 1 0% -4.66% -1.91% -7.52%

LIMIT seems to do well for the NPS1 case, but how come it falls apart
for NPS2 ?!? that doesn't realy make sense, does it?

And again NPS4 is all over the place :/

>
> --
> If you would like me to collect any more information during any of the
> above benchmark runs, please let me know.

dizzy with numbers ....

Perhaps see if you can figure out why NPS4 is so weird, there's only 2
CCXs to go around per node on that thing, the various results should not
be all over the map.

Perhaps pick hackbenc since it shows the problem and is easy and quick
to run?

Also, can you share the TOPOEXT code?