Re: [PATCH v8 2/2] sched/fair: Introduce SIS_CURRENT to wake up short task on current CPU

From: Peter Zijlstra
Date: Mon May 01 2023 - 09:49:27 EST


On Sat, Apr 29, 2023 at 07:16:56AM +0800, Chen Yu wrote:
> netperf
> =======
> case load baseline(std%) compare%( std%)
> TCP_RR 56-threads 1.00 ( 1.96) +15.23 ( 4.67)
> TCP_RR 112-threads 1.00 ( 1.84) +88.83 ( 4.37)
> TCP_RR 168-threads 1.00 ( 0.41) +475.45 ( 4.45)
> TCP_RR 224-threads 1.00 ( 0.62) +806.85 ( 3.67)
> TCP_RR 280-threads 1.00 ( 65.80) +162.66 ( 10.26)
> TCP_RR 336-threads 1.00 ( 17.30) -0.19 ( 19.07)
> TCP_RR 392-threads 1.00 ( 26.88) +3.38 ( 28.91)
> TCP_RR 448-threads 1.00 ( 36.43) -0.26 ( 33.72)
> UDP_RR 56-threads 1.00 ( 7.91) +3.77 ( 17.48)
> UDP_RR 112-threads 1.00 ( 2.72) -15.02 ( 10.78)
> UDP_RR 168-threads 1.00 ( 8.86) +131.77 ( 13.30)
> UDP_RR 224-threads 1.00 ( 9.54) +178.73 ( 16.75)
> UDP_RR 280-threads 1.00 ( 15.40) +189.69 ( 19.36)
> UDP_RR 336-threads 1.00 ( 24.09) +0.54 ( 22.28)
> UDP_RR 392-threads 1.00 ( 39.63) -3.90 ( 33.77)
> UDP_RR 448-threads 1.00 ( 43.57) +1.57 ( 40.43)
>
> tbench
> ======
> case load baseline(std%) compare%( std%)
> loopback 56-threads 1.00 ( 0.50) +10.78 ( 0.52)
> loopback 112-threads 1.00 ( 0.19) +2.73 ( 0.08)
> loopback 168-threads 1.00 ( 0.09) +173.72 ( 0.47)
> loopback 224-threads 1.00 ( 0.20) -2.13 ( 0.42)
> loopback 280-threads 1.00 ( 0.06) -0.77 ( 0.15)
> loopback 336-threads 1.00 ( 0.14) -0.08 ( 0.08)
> loopback 392-threads 1.00 ( 0.17) -0.27 ( 0.86)
> loopback 448-threads 1.00 ( 0.37) +0.32 ( 0.02)

So,... I've been poking around with this a bit today and I'm not seeing
it. On my ancient IVB-EP (2*10*2) with the code as in
queue/sched/core I get:

netperf NO_WA_WEIGHT NO_SIS_CURRENT
NO_WA_BIAS SIS_CURRENT
-------------------------------------------------------------------
TCP_SENDFILE-1 : Avg: 40495.7 41899.7 42001 40783.4
TCP_SENDFILE-10 : Avg: 37218.6 37200.1 37065.1 36604.4
TCP_SENDFILE-20 : Avg: 21495.1 21516.6 21004.4 21356.9
TCP_SENDFILE-40 : Avg: 6947.24 7917.64 7079.93 7231.3
TCP_SENDFILE-80 : Avg: 4081.91 3572.48 3582.98 3615.85
TCP_STREAM-1 : Avg: 37078.1 34469.4 37134.5 35095.4
TCP_STREAM-10 : Avg: 31532.1 31265.8 31260.7 31588.1
TCP_STREAM-20 : Avg: 17848 17914.9 17996.6 17937.4
TCP_STREAM-40 : Avg: 7844.3 7201.65 7710.4 7790.62
TCP_STREAM-80 : Avg: 2518.38 2932.74 2601.51 2903.89
TCP_RR-1 : Avg: 84347.1 81056.2 81167.8 83541.3
TCP_RR-10 : Avg: 71539.1 72099.5 71123.2 69447.9
TCP_RR-20 : Avg: 51053.3 50952.4 50905.4 52157.2
TCP_RR-40 : Avg: 46370.9 46477.5 46289.2 46350.7
TCP_RR-80 : Avg: 21515.2 22497.9 22024.4 22229.2
UDP_RR-1 : Avg: 96933 100076 95997.2 96553.3
UDP_RR-10 : Avg: 83937.3 83054.3 83878.5 78998.6
UDP_RR-20 : Avg: 61974 61897.5 61838.8 62926
UDP_RR-40 : Avg: 56708.6 57053.9 56456.1 57115.2
UDP_RR-80 : Avg: 26950 27895.8 27635.2 27784.8
UDP_STREAM-1 : Avg: 52808.3 55296.8 52808.2 51908.6
UDP_STREAM-10 : Avg: 45810 42944.1 43115 43561.2
UDP_STREAM-20 : Avg: 19212.7 17572.9 18798.7 20066
UDP_STREAM-40 : Avg: 13105.1 13096.9 13070.5 13110.2
UDP_STREAM-80 : Avg: 6372.57 6367.96 6248.86 6413.09


tbench

NO_WA_WEIGHT, NO_WA_BIAS, NO_SIS_CURRENT

Throughput 626.57 MB/sec 2 clients 2 procs max_latency=0.095 ms
Throughput 1316.08 MB/sec 5 clients 5 procs max_latency=0.106 ms
Throughput 1905.19 MB/sec 10 clients 10 procs max_latency=0.161 ms
Throughput 2428.05 MB/sec 20 clients 20 procs max_latency=0.284 ms
Throughput 2323.16 MB/sec 40 clients 40 procs max_latency=0.381 ms
Throughput 2229.93 MB/sec 80 clients 80 procs max_latency=0.873 ms

WA_WEIGHT, NO_WA_BIAS, NO_SIS_CURRENT

Throughput 575.04 MB/sec 2 clients 2 procs max_latency=0.093 ms
Throughput 1285.37 MB/sec 5 clients 5 procs max_latency=0.122 ms
Throughput 1916.10 MB/sec 10 clients 10 procs max_latency=0.150 ms
Throughput 2422.54 MB/sec 20 clients 20 procs max_latency=0.292 ms
Throughput 2361.57 MB/sec 40 clients 40 procs max_latency=0.448 ms
Throughput 2479.70 MB/sec 80 clients 80 procs max_latency=1.249 ms

WA_WEIGHT, WA_BIAS, NO_SIS_CURRENT (aka, mainline)

Throughput 649.46 MB/sec 2 clients 2 procs max_latency=0.092 ms
Throughput 1370.93 MB/sec 5 clients 5 procs max_latency=0.140 ms
Throughput 1904.14 MB/sec 10 clients 10 procs max_latency=0.470 ms
Throughput 2406.15 MB/sec 20 clients 20 procs max_latency=0.276 ms
Throughput 2419.40 MB/sec 40 clients 40 procs max_latency=0.414 ms
Throughput 2426.00 MB/sec 80 clients 80 procs max_latency=1.366 ms

WA_WEIGHT, WA_BIAS, SIS_CURRENT (aka, with patches on)

Throughput 646.55 MB/sec 2 clients 2 procs max_latency=0.104 ms
Throughput 1361.06 MB/sec 5 clients 5 procs max_latency=0.100 ms
Throughput 1889.82 MB/sec 10 clients 10 procs max_latency=0.154 ms
Throughput 2406.57 MB/sec 20 clients 20 procs max_latency=3.667 ms
Throughput 2318.00 MB/sec 40 clients 40 procs max_latency=0.390 ms
Throughput 2384.85 MB/sec 80 clients 80 procs max_latency=1.371 ms


So what's going on here? I don't see anything exciting happening at the
40 mark. At the same time, I can't seem to reproduce Mike's latency pile
up either :/