Re: [PATCH v8 2/2] sched/fair: Introduce SIS_CURRENT to wake up short task on current CPU

From: Peter Zijlstra
Date: Tue May 02 2023 - 07:55:23 EST


On Mon, May 01, 2023 at 11:52:47PM +0800, Chen Yu wrote:

> > So,... I've been poking around with this a bit today and I'm not seeing
> > it. On my ancient IVB-EP (2*10*2) with the code as in
> > queue/sched/core I get:
> >
> > netperf NO_SIS_CURRENT %
> > SIS_CURRENT
> > ----------------------- -------------------------------
> > TCP_SENDFILE-1 : Avg: 42001 40783.4 -2.89898
> > TCP_SENDFILE-10 : Avg: 37065.1 36604.4 -1.24295
> > TCP_SENDFILE-20 : Avg: 21004.4 21356.9 1.67822
> > TCP_SENDFILE-40 : Avg: 7079.93 7231.3 2.13802
> > TCP_SENDFILE-80 : Avg: 3582.98 3615.85 0.917393

> > TCP_STREAM-1 : Avg: 37134.5 35095.4 -5.49112
> > TCP_STREAM-10 : Avg: 31260.7 31588.1 1.04732
> > TCP_STREAM-20 : Avg: 17996.6 17937.4 -0.328951
> > TCP_STREAM-40 : Avg: 7710.4 7790.62 1.04041
> > TCP_STREAM-80 : Avg: 2601.51 2903.89 11.6232

> > TCP_RR-1 : Avg: 81167.8 83541.3 2.92419
> > TCP_RR-10 : Avg: 71123.2 69447.9 -2.35549
> > TCP_RR-20 : Avg: 50905.4 52157.2 2.45907
> > TCP_RR-40 : Avg: 46289.2 46350.7 0.13286
> > TCP_RR-80 : Avg: 22024.4 22229.2 0.929878

> > UDP_RR-1 : Avg: 95997.2 96553.3 0.579288
> > UDP_RR-10 : Avg: 83878.5 78998.6 -5.81782
> > UDP_RR-20 : Avg: 61838.8 62926 1.75812
> > UDP_RR-40 : Avg: 56456.1 57115.2 1.16746
> > UDP_RR-80 : Avg: 27635.2 27784.8 0.541339

> > UDP_STREAM-1 : Avg: 52808.2 51908.6 -1.70352
> > UDP_STREAM-10 : Avg: 43115 43561.2 1.03491
> > UDP_STREAM-20 : Avg: 18798.7 20066 6.74142
> > UDP_STREAM-40 : Avg: 13070.5 13110.2 0.303737
> > UDP_STREAM-80 : Avg: 6248.86 6413.09 2.62816


> > tbench

> > WA_WEIGHT, WA_BIAS, NO_SIS_CURRENT (aka, mainline)
> >
> > Throughput 649.46 MB/sec 2 clients 2 procs max_latency=0.092 ms
> > Throughput 1370.93 MB/sec 5 clients 5 procs max_latency=0.140 ms
> > Throughput 1904.14 MB/sec 10 clients 10 procs max_latency=0.470 ms
> > Throughput 2406.15 MB/sec 20 clients 20 procs max_latency=0.276 ms
> > Throughput 2419.40 MB/sec 40 clients 40 procs max_latency=0.414 ms
> > Throughput 2426.00 MB/sec 80 clients 80 procs max_latency=1.366 ms
> >
> > WA_WEIGHT, WA_BIAS, SIS_CURRENT (aka, with patches on)
> >
> > Throughput 646.55 MB/sec 2 clients 2 procs max_latency=0.104 ms
> > Throughput 1361.06 MB/sec 5 clients 5 procs max_latency=0.100 ms
> > Throughput 1889.82 MB/sec 10 clients 10 procs max_latency=0.154 ms
> > Throughput 2406.57 MB/sec 20 clients 20 procs max_latency=3.667 ms
> > Throughput 2318.00 MB/sec 40 clients 40 procs max_latency=0.390 ms
> > Throughput 2384.85 MB/sec 80 clients 80 procs max_latency=1.371 ms
> >
> >
> > So what's going on here? I don't see anything exciting happening at the
> > 40 mark. At the same time, I can't seem to reproduce Mike's latency pile
> > up either :/
> >
> Thank you very much for trying this patch. This patch was found to mainly
> benefit system with large number of CPUs in 1 LLC. Previously I tested
> it on Sapphire Rapids(2x56C/224T) and Ice Lake Server(2x32C/128T)[1], it
> seems to have benefit on them. The benefit seems to come from:
> 1. reducing the waker stacking among many CPUs within 1 LLC

I should be seeing that at 10 cores per LLC. And when we look at the
tbench results (never the most stable -- let me run a few more of those)
it looks like SIS_CURRENT is actually making that worse.

That latency spike at 20 seems stable for me -- and 3ms is rather small,
I've seen it up to 11ms (but typical in the 4-6 range). This does not
happen with NO_SIS_CURRENT and is a fairly big point against these
patches.

> 2. reducing the C2C overhead within 1 LLC

This is due to how L3 became non-inclusive with Skylake? I can't see
that because I don't have anything that recent :/

> So far I did not received performance difference from LKP on desktop
> test boxes. Let me queue the full test on some desktops to confirm
> if this change has any impact on them.

Right, so I've updated my netperf results above to have a relative
difference between NO_SIS_CURRENT and SIS_CURRENT and I see some losses
at the low end. For servers that gets compensated at the high end, but
desktops tend to not get there much.