Re: [RFC PATCH v3 00/16] Core scheduling v3

From: Julien Desfossez
Date: Thu May 30 2019 - 10:21:10 EST


On 30-May-2019 10:04:39 PM, Aubrey Li wrote:
> On Thu, May 30, 2019 at 4:36 AM Vineeth Remanan Pillai
> <vpillai@xxxxxxxxxxxxxxxx> wrote:
> >
> > Third iteration of the Core-Scheduling feature.
> >
> > This version fixes mostly correctness related issues in v2 and
> > addresses performance issues. Also, addressed some crashes related
> > to cgroups and cpu hotplugging.
> >
> > We have tested and verified that incompatible processes are not
> > selected during schedule. In terms of performance, the impact
> > depends on the workload:
> > - on CPU intensive applications that use all the logical CPUs with
> > SMT enabled, enabling core scheduling performs better than nosmt.
> > - on mixed workloads with considerable io compared to cpu usage,
> > nosmt seems to perform better than core scheduling.
>
> My testing scripts can not be completed on this version. I figured out the
> number of cpu utilization report entry didn't reach my minimal requirement.
> Then I wrote a simple script to verify.
> ====================
> $ cat test.sh
> #!/bin/sh
>
> for i in `seq 1 10`
> do
> echo `date`, $i
> sleep 1
> done
> ====================
>
> Normally it works as below:
>
> Thu May 30 14:13:40 CST 2019, 1
> Thu May 30 14:13:41 CST 2019, 2
> Thu May 30 14:13:42 CST 2019, 3
> Thu May 30 14:13:43 CST 2019, 4
> Thu May 30 14:13:44 CST 2019, 5
> Thu May 30 14:13:45 CST 2019, 6
> Thu May 30 14:13:46 CST 2019, 7
> Thu May 30 14:13:47 CST 2019, 8
> Thu May 30 14:13:48 CST 2019, 9
> Thu May 30 14:13:49 CST 2019, 10
>
> When the system was running 32 sysbench threads and
> 32 gemmbench threads, it worked as below(the system
> has ~38% idle time)
> Thu May 30 14:14:20 CST 2019, 1
> Thu May 30 14:14:21 CST 2019, 2
> Thu May 30 14:14:22 CST 2019, 3
> Thu May 30 14:14:24 CST 2019, 4 <=======x=
> Thu May 30 14:14:25 CST 2019, 5
> Thu May 30 14:14:26 CST 2019, 6
> Thu May 30 14:14:28 CST 2019, 7 <=======x=
> Thu May 30 14:14:29 CST 2019, 8
> Thu May 30 14:14:31 CST 2019, 9 <=======x=
> Thu May 30 14:14:34 CST 2019, 10 <=======x=
>
> And it got worse when the system was running 64/64 case,
> the system still had ~3% idle time
> Thu May 30 14:26:40 CST 2019, 1
> Thu May 30 14:26:46 CST 2019, 2
> Thu May 30 14:26:53 CST 2019, 3
> Thu May 30 14:27:01 CST 2019, 4
> Thu May 30 14:27:03 CST 2019, 5
> Thu May 30 14:27:11 CST 2019, 6
> Thu May 30 14:27:31 CST 2019, 7
> Thu May 30 14:27:32 CST 2019, 8
> Thu May 30 14:27:41 CST 2019, 9
> Thu May 30 14:27:56 CST 2019, 10
>
> Any thoughts?

Interesting, could you detail a bit more your test setup (commands used,
type of machine, any cgroup/pinning configuration, etc) ? I would like
to reproduce it and investigate.

Thanks,

Julien