Re: [RFC PATCH v3 0/3] sched: Skip queued wakeups only when L2 is shared

From: Mathieu Desnoyers
Date: Fri Aug 25 2023 - 10:04:00 EST


On 8/25/23 06:11, Swapnil Sapkal wrote:
Hello Mathieu,

On 8/22/2023 5:01 PM, Mathieu Desnoyers wrote:
This series improves performance of scheduler wakeups on large systems
by skipping queued wakeups only when CPUs share their L2 cache, rather
than when they share their LLC.

The speedup mainly reproduces on workloads which have at least *some*
idle time (because it significantly increases the number of migrations,
and thus remote wakeups), *and* it needs to have a sufficient load to
cause contention on the runqueue locks.

Feedback is welcome,

I ran some micro-benchmarks as part of testing this series. Here are the
observations:

- Hackbench shows improvement with this patch and Aaron's patch with
  6.5-rc1 kernel as the baseline.

- tbench and netperf shows shows some dip in performance with highly
  overloaded case.

- Other micro-benchmarks shows more or less similar performance with
  these patches.

Those results look promising! Thanks for testing!

Mathieu



o System Details

- 4th Generation EPYC System
- 2 x 128C/256T
- NPS1 mode

o Kernels

base:                                    6.5.0-rc1
base + mathieu-queued-wakeup:        6.5.0-rc1 + Mathieu's patches [1]
base + aaron-tg-load-avg:         6.5.0-rc1 + Aaron's patch [2]
base + queued-wakeup + tg-load-avg:     6.5.0-rc1 + Mathieu's patches [1] + Aaron's patch [2]

[References]

[1] "sched: Skip queued wakeups only when L2 is shared"
(https://lore.kernel.org/all/20230822113133.643238-1-mathieu.desnoyers@xxxxxxxxxxxx/)
[2] "Reduce cost of accessing tg->load_avg"
(https://lore.kernel.org/lkml/20230823060832.454842-1-aaron.lu@xxxxxxxxx/)

==================================================================
Test          : hackbench
Units         : Time in seconds
Interpretation: Lower is better
Statistic     : AMean
==================================================================
Test:        6.5.0-rc1 (base)    base + mathieu-queued-wakeup       base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
 1-groups:   22.15 (0.00 pct)      22.46 (-1.39 pct) 22.35 (-0.90 pct)                   21.20 (4.28 pct)
 2-groups:   22.76 (0.00 pct)      21.78 (4.30 pct) 22.60 (0.70 pct)                    21.90 (3.77 pct)
 4-groups:   22.12 (0.00 pct)      22.02 (0.45 pct) 22.22 (-0.45 pct)                   21.94 (0.81 pct)
 8-groups:   24.80 (0.00 pct)      22.36 (9.83 pct) 22.99 (7.29 pct)                    22.00 (11.29 pct)
16-groups:   31.09 (0.00 pct)      21.56 (30.65 pct) 22.13 (28.81 pct)                   20.60 (33.74 pct)

==================================================================
Test          : tbench
Units         : Throughput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup           base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
    1    261.49 (0.00 pct)       261.18 (-0.11 pct) 262.29 (0.30 pct)                   257.80 (-1.41 pct)
    2    514.08 (0.00 pct)       521.30 (1.40 pct) 517.66 (0.69 pct)                   510.96 (-0.60 pct)
    4    1002.51 (0.00 pct)      988.81 (-1.36 pct) 995.04 (-0.74 pct)                  987.74 (-1.47 pct)
    8    1978.74 (0.00 pct)      1966.60 (-0.61 pct) 1991.85 (0.66 pct)                  1941.39 (-1.88 pct)
   16    3864.14 (0.00 pct)      3952.03 (2.27 pct) 3914.80 (1.31 pct)                  3873.88 (0.25 pct)
   32    7473.19 (0.00 pct)      7602.38 (1.72 pct) 7585.94 (1.50 pct)                  7423.44 (-0.66 pct)
   64    14335.10 (0.00 pct)     14313.17 (-0.15 pct) 14474.67 (0.97 pct)                 14030.63 (-2.12 pct)
  128    27275.73 (0.00 pct)     25176.80 (-7.69 pct) 28066.53 (2.89 pct)                 25045.53 (-8.17 pct)
  256    41688.17 (0.00 pct)     44373.40 (6.44 pct) 43779.37 (5.01 pct)                 41427.00 (-0.62 pct)
  512    137481.33 (0.00 pct)    136466.67 (-0.73 pct) 134824.00 (-1.93 pct)               141280.00 (2.76 pct)
 1024    140534.00 (0.00 pct)    141916.33 (0.98 pct) 137008.33 (-2.50 pct)               126319.33 (-10.11 pct)
 2048    145378.00 (0.00 pct)    145479.33 (0.06 pct) 138763.67 (-4.54 pct)               124471.00 (-14.38 pct)

 ==================================================================
 Test          : netperf
 Units         : Througput
 Interpretation: Higher is better
 Statistic     : AMean
 ==================================================================
                 6.5.0-rc1 (base)    base + mathieu-queued-wakeup base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
  1-clients:      59642.88 (0.00 pct)        61647.37 (3.36 pct)         61186.24 (2.58 pct)                 59099.11 (-0.91 pct)
  2-clients:      59349.65 (0.00 pct)        60896.01 (2.60 pct)         60582.49 (2.07 pct)                 62738.47 (5.70 pct)
  4-clients:      59197.37 (0.00 pct)        60457.29 (2.12 pct)         63042.52 (6.49 pct)                 60879.58 (2.84 pct)
  8-clients:      61977.66 (0.00 pct)        60389.92 (-2.56 pct)        62078.15 (0.16 pct)                 60314.65 (-2.68 pct)
 16-clients:      61518.83 (0.00 pct)        61143.51 (-0.61 pct)        60946.08 (-0.93 pct)                59388.78 (-3.46 pct)
 32-clients:      58230.81 (0.00 pct)        58653.20 (0.72 pct)         58594.14 (0.62 pct)                 58188.52 (-0.07 pct)
 64-clients:      58050.92 (0.00 pct)        57834.55 (-0.37 pct)        58183.51 (0.22 pct)                 57565.75 (-0.83 pct)
 128-clients:     54324.55 (0.00 pct)        54385.60 (0.11 pct)         54913.43 (1.08 pct)                 53917.11 (-0.75 pct)
 256-clients:     70155.29 (0.00 pct)        69390.68 (-1.08 pct)        70097.50 (-0.08 pct)                64410.66 (-8.18 pct)
 512-clients:     61511.77 (0.00 pct)        61480.99 (-0.05 pct)        54493.82 (-11.40 pct)               46227.05 (-24.84 pct)

==================================================================
Test          : stream-10
Units         : Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:      6.5.0-rc1 (base)      base + mathieu-queued-wakeup base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
 Copy:   353336.76 (0.00 pct)       352956.36 (-0.10 pct) 349583.67 (-1.06 pct)               351152.80 (-0.61 pct)
Scale:   353474.88 (0.00 pct)       354582.35 (0.31 pct) 350543.75 (-0.82 pct)               353275.74 (-0.05 pct)
  Add:   371984.24 (0.00 pct)       372824.87 (0.22 pct) 369173.72 (-0.75 pct)               370483.63 (-0.40 pct)
Triad:   372625.41 (0.00 pct)       278389.62 (-25.28 pct) 369504.06 (-0.83 pct)               369070.11 (-0.95 pct)

==================================================================
Test          : stream-100
Units         : Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:     6.5.0-rc1 (base)        base + mathieu-queued-wakeup base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
 Copy:   353476.35 (0.00 pct)       354954.50 (0.41 pct) 354614.56 (0.32 pct)                353512.71 (0.01 pct)
Scale:   353214.73 (0.00 pct)       354884.12 (0.47 pct) 355841.17 (0.74 pct)                353220.53 (0.00 pct)
  Add:   370755.48 (0.00 pct)       372292.72 (0.41 pct) 375307.35 (1.22 pct)                369917.77 (-0.22 pct)
Triad:   370652.02 (0.00 pct)       372732.11 (0.56 pct) 375718.85 (1.36 pct)                369926.26 (-0.19 pct)

==================================================================
Test          : schbench (old)
Units         : 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
#workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
  1:      56.00 (0.00 pct)        58.00 (-3.57 pct)                      60.00 (-7.14 pct)                   60.00 (-7.14 pct)
  2:      61.00 (0.00 pct)        56.00 (8.19 pct)                       59.00 (3.27 pct)                    60.00 (1.63 pct)
  4:      64.00 (0.00 pct)        62.00 (3.12 pct)                       66.00 (-3.12 pct)                   64.00 (0.00 pct)
  8:      96.00 (0.00 pct)        78.00 (18.75 pct)                      76.00 (20.83 pct)                   93.00 (3.12 pct)
 16:      98.00 (0.00 pct)        95.00 (3.06 pct)                       98.00 (0.00 pct)                    95.00 (3.06 pct)
 32:     137.00 (0.00 pct)       144.00 (-5.10 pct) 133.00 (2.91 pct)                   130.00 (5.10 pct)
 64:     206.00 (0.00 pct)       210.00 (-1.94 pct) 200.00 (2.91 pct)                   217.00 (-5.33 pct)
128:     348.00 (0.00 pct)       347.00 (0.28 pct) 413.00 (-18.67 pct)                 366.00 (-5.17 pct)
256:     679.00 (0.00 pct)       669.00 (1.47 pct) 669.00 (1.47 pct)                   675.00 (0.58 pct)
512:     1366.00 (0.00 pct)      1366.00 (0.00 pct) 1442.00 (-5.56 pct)                 1430.00 (-4.68 pct)


==================================================================
Test          : schbench (new)
Units         : 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
Metric: wakeup_lat_summary
#workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
  1:      15.00 (0.00 pct)        15.00 (0.00 pct)                       16.00 (-6.66 pct)                   17.00 (-13.33 pct)
  2:      16.00 (0.00 pct)        16.00 (0.00 pct)                       17.00 (-6.25 pct)                   17.00 (-6.25 pct)
  4:      17.00 (0.00 pct)        17.00 (0.00 pct)                       15.00 (11.76 pct)                   17.00 (0.00 pct)
  8:      11.00 (0.00 pct)        13.00 (-18.18 pct)                     11.00 (0.00 pct)                    11.00 (0.00 pct)
 16:      11.00 (0.00 pct)        11.00 (0.00 pct)                       10.00 (9.09 pct)                     9.00 (18.18 pct)
 32:      11.00 (0.00 pct)        11.00 (0.00 pct)                       11.00 (0.00 pct)                    11.00 (0.00 pct)
 64:      10.00 (0.00 pct)        11.00 (-10.00 pct)                     10.00 (0.00 pct)                    10.00 (0.00 pct)
128:      11.00 (0.00 pct)        12.00 (-9.09 pct) 12.00 (-9.09 pct)                   11.00 (0.00 pct)
256:     117.00 (0.00 pct)       162.00 (-38.46 pct) 90.00 (23.07 pct)                  103.00 (11.96 pct)
512:     22496.00 (0.00 pct)     21664.00 (3.69 pct) 22368.00 (0.56 pct)                 21408.00 (4.83 pct)

Metric: request_lat_summary
#workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
  1:     6872.00 (0.00 pct)      6872.00 (0.00 pct) 6792.00 (1.16 pct)                  6856.00 (0.23 pct)
  2:     6824.00 (0.00 pct)      6824.00 (0.00 pct) 6872.00 (-0.70 pct)                 6856.00 (-0.46 pct)
  4:     6824.00 (0.00 pct)      6808.00 (0.23 pct) 6872.00 (-0.70 pct)                 6824.00 (0.00 pct)
  8:     6824.00 (0.00 pct)      6824.00 (0.00 pct) 6872.00 (-0.70 pct)                 6824.00 (0.00 pct)
 16:     6824.00 (0.00 pct)      6840.00 (-0.23 pct) 6872.00 (-0.70 pct)                 6840.00 (-0.23 pct)
 32:     6840.00 (0.00 pct)      6840.00 (0.00 pct) 6888.00 (-0.70 pct)                 6856.00 (-0.23 pct)
 64:     6840.00 (0.00 pct)      6872.00 (-0.46 pct) 6888.00 (-0.70 pct)                 6872.00 (-0.46 pct)
128:     12272.00 (0.00 pct)     12784.00 (-4.17 pct) 13200.00 (-7.56 pct)                12016.00 (2.08 pct)
256:     13328.00 (0.00 pct)     13392.00 (-0.48 pct) 13712.00 (-2.88 pct)                13552.00 (-1.68 pct)
512:     88832.00 (0.00 pct)     86400.00 (2.73 pct) 88192.00 (0.72 pct)                 85632.00 (3.60 pct)

Metric: rps_summary
#workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
  1:     297.00 (0.00 pct)       297.00 (0.00 pct) 297.00 (0.00 pct)                   299.00 (-0.67 pct)
  2:     601.00 (0.00 pct)       603.00 (-0.33 pct) 595.00 (0.99 pct)                   601.00 (0.00 pct)
  4:     1206.00 (0.00 pct)      1206.00 (0.00 pct) 1190.00 (1.32 pct)                  1206.00 (0.00 pct)
  8:     2412.00 (0.00 pct)      2412.00 (0.00 pct) 2396.00 (0.66 pct)                  2420.00 (-0.33 pct)
 16:     4840.00 (0.00 pct)      4824.00 (0.33 pct) 4792.00 (0.99 pct)                  4840.00 (0.00 pct)
 32:     9648.00 (0.00 pct)      9648.00 (0.00 pct) 9584.00 (0.66 pct)                  9680.00 (-0.33 pct)
 64:     19360.00 (0.00 pct)     19296.00 (0.33 pct) 19168.00 (0.99 pct)                 19296.00 (0.33 pct)
128:     37952.00 (0.00 pct)     35264.00 (7.08 pct) 36672.00 (3.37 pct)                 38080.00 (-0.33 pct)
256:     41408.00 (0.00 pct)     41536.00 (-0.30 pct) 39744.00 (4.01 pct)                 40896.00 (1.23 pct)
512:     36288.00 (0.00 pct)     36800.00 (-1.41 pct) 35264.00 (2.82 pct)                 35776.00 (1.41 pct)

Tested-by: Swapnil Sapkal <Swapnil.Sapkal@xxxxxxx>


Thanks,

Mathieu

Mathieu Desnoyers (3):
   sched: Rename cpus_share_cache to cpus_share_llc
   sched: Introduce cpus_share_l2c (v3)
   sched: ttwu_queue_cond: skip queued wakeups across different l2 caches

Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Valentin Schneider <vschneid@xxxxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Ben Segall <bsegall@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>
Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Cc: Juri Lelli <juri.lelli@xxxxxxxxxx>
Cc: Swapnil Sapkal <Swapnil.Sapkal@xxxxxxx>
Cc: Aaron Lu <aaron.lu@xxxxxxxxx>
Cc: Julien Desfossez <jdesfossez@xxxxxxxxxxxxxxxx>
Cc: x86@xxxxxxxxxx

  block/blk-mq.c                 |  2 +-
  include/linux/sched/topology.h | 10 ++++++++--
  kernel/sched/core.c            | 14 +++++++++++---
  kernel/sched/fair.c            |  8 ++++----
  kernel/sched/sched.h           |  2 ++
  kernel/sched/topology.c        | 32 +++++++++++++++++++++++++++++---
  6 files changed, 55 insertions(+), 13 deletions(-)

--
Thanks and Regards,
Swapnil

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com