Re: [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS

From: Chris Mason
Date: Thu Jun 22 2023 - 11:59:02 EST


On 6/21/23 2:03 AM, Aaron Lu wrote:
> On Wed, Jun 21, 2023 at 12:43:52AM -0500, David Vernet wrote:
>> On Wed, Jun 21, 2023 at 12:54:16PM +0800, Aaron Lu wrote:
>>> On Tue, Jun 20, 2023 at 09:43:00PM -0500, David Vernet wrote:
>>>> On Wed, Jun 21, 2023 at 10:35:34AM +0800, Aaron Lu wrote:
>>>>> On Tue, Jun 20, 2023 at 12:36:26PM -0500, David Vernet wrote:

[ ... ]

>>>> I'm not sure what we're hoping to gain by continuing to run various
>>>> netperf workloads with your specific parameters?
>>>
>>> I don't quite follow you.
>>>
>>> I thought we were in the process of figuring out why for the same
>>> workload(netperf/default_mode/nr_client=nr_cpu) on two similar
>>> machines(both are Skylake) you saw no contention while I saw some so I
>>> tried to be exact on how I run the workload.
>>
>> I just reran the workload on a 26 core / 52 thread Cooper Lake using
>> your exact command below and still don't observe any contention
>> whatsoever on the swqueue lock:
>
> Well, it's a puzzle to me.
>
> But as you said below, I guess I'll just move on.

Thanks for bringing this up Aaron. The discussion moved on to different
ways to fix the netperf triggered contention, but I wanted to toss this
out as an easy way to see the same problem:

# swqueue disabled:
# ./schbench -L -m 52 -p 512 -r 10 -t 1
Wakeup Latencies percentiles (usec) runtime 10 (s) (14674354 total samples)
20.0th: 8 (4508866 samples)
50.0th: 11 (2879648 samples)
90.0th: 35 (5865268 samples)
* 99.0th: 70 (1282166 samples)
99.9th: 110 (124040 samples)
min=1, max=9312
avg worker transfer: 28211.91 ops/sec 13.78MB/s

During the swqueue=0 run, the system was ~30% idle

# swqueue enabled:
# ./schbench -L -m 52 -p 512 -r 10 -t 1
Wakeup Latencies percentiles (usec) runtime 10 (s) (6448414 total samples)
20.0th: 30 (1383423 samples)
50.0th: 39 (1986827 samples)
90.0th: 63 (2446965 samples)
* 99.0th: 108 (567275 samples)
99.9th: 158 (57487 samples)
min=1, max=15018
avg worker transfer: 12395.27 ops/sec 6.05MB/s

During the swqueue=1 run, the CPU was at was 97% system time, all stuck
on spinlock contention in the scheduler.

This is a single socket cooperlake with 26 cores/52 threads.

The work is similar to perf pipe test, 52 messenger threads each bouncing
a message back and forth with their own private worker for a 10 second run.

Adding more messenger threads (-m 128) increases the swqueue=0 ops/sec
to about 19MB/s and drags down the swqueue=1 ops/sec to 2MB/s.

-chris