Re: [RFC PATCH v7] sched/fair: select idle cpu from idle cpumask for task wakeup

From: Mel Gorman
Date: Wed Dec 09 2020 - 09:37:36 EST


On Wed, Dec 09, 2020 at 02:24:04PM +0800, Aubrey Li wrote:
> Add idle cpumask to track idle cpus in sched domain. Every time
> a CPU enters idle, the CPU is set in idle cpumask to be a wakeup
> target. And if the CPU is not in idle, the CPU is cleared in idle
> cpumask during scheduler tick to ratelimit idle cpumask update.
>
> When a task wakes up to select an idle cpu, scanning idle cpumask
> has lower cost than scanning all the cpus in last level cache domain,
> especially when the system is heavily loaded.
>
> Benchmarks including hackbench, schbench, uperf, sysbench mysql
> and kbuild were tested on a x86 4 socket system with 24 cores per
> socket and 2 hyperthreads per core, total 192 CPUs, no regression
> found.
>

I ran this patch with tbench on top of of the schedstat patches that
track SIS efficiency. The tracking adds overhead so it's not a perfect
performance comparison but the expectation would be that the patch reduces
the number of runqueues that are scanned

tbench4
5.10.0-rc6 5.10.0-rc6
schedstat-v1r1 idlemask-v7r1
Hmean 1 504.76 ( 0.00%) 500.14 * -0.91%*
Hmean 2 1001.22 ( 0.00%) 970.37 * -3.08%*
Hmean 4 1930.56 ( 0.00%) 1880.96 * -2.57%*
Hmean 8 3688.05 ( 0.00%) 3537.72 * -4.08%*
Hmean 16 6352.71 ( 0.00%) 6439.53 * 1.37%*
Hmean 32 10066.37 ( 0.00%) 10124.65 * 0.58%*
Hmean 64 12846.32 ( 0.00%) 11627.27 * -9.49%*
Hmean 128 22278.41 ( 0.00%) 22304.33 * 0.12%*
Hmean 256 21455.52 ( 0.00%) 20900.13 * -2.59%*
Hmean 320 21802.38 ( 0.00%) 21928.81 * 0.58%*

Not very optimistic result. The schedstats indicate;

5.10.0-rc6 5.10.0-rc6
schedstat-v1r1 idlemask-v7r1
Ops TTWU Count 5599714302.00 5589495123.00
Ops TTWU Local 2687713250.00 2563662550.00
Ops SIS Search 5596677950.00 5586381168.00
Ops SIS Domain Search 3268344934.00 3229088045.00
Ops SIS Scanned 15909069113.00 16568899405.00
Ops SIS Domain Scanned 13580736097.00 14211606282.00
Ops SIS Failures 2944874939.00 2843113421.00
Ops SIS Core Search 262853975.00 311781774.00
Ops SIS Core Hit 185189656.00 216097102.00
Ops SIS Core Miss 77664319.00 95684672.00
Ops SIS Recent Used Hit 124265515.00 146021086.00
Ops SIS Recent Used Miss 338142547.00 403547579.00
Ops SIS Recent Attempts 462408062.00 549568665.00
Ops SIS Search Efficiency 35.18 33.72
Ops SIS Domain Search Eff 24.07 22.72
Ops SIS Fast Success Rate 41.60 42.20
Ops SIS Success Rate 47.38 49.11
Ops SIS Recent Success Rate 26.87 26.57

The field I would expect to decrease is SIS Domain Scanned -- the number
of runqueues that were examined but it's actually worse and graphing over
time shows it's worse for the client thread counts. select_idle_cpu()
is definitely being called because "Domain Search" is 10 times higher than
"Core Search" and there "Core Miss" is non-zero.

I suspect the issue is that the mask is only marked busy from the tick
context which is a very wide window. If select_idle_cpu() picks an idle
CPU from the mask, it's still marked as idle in the mask.

--
Mel Gorman
SUSE Labs