Re: [PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG_SCHEDSTATS

From: Josh Poimboeuf
Date: Tue Jun 28 2016 - 22:32:53 EST


On Tue, Jun 28, 2016 at 02:43:36PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 17, 2016 at 12:43:22PM -0500, Josh Poimboeuf wrote:
> > NOTE: I didn't include any performance numbers because I wasn't able to
> > get consistent results. I tried the following on a Xeon E5-2420 v2 CPU:
> >
> > $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo -n performance > $i; done
> > $ echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
> > $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
> > $ echo 0 > /proc/sys/kernel/nmi_watchdog
> > $ taskset 0x10 perf stat -n -r10 perf bench sched pipe -l 1000000
> >
> > I was going to post the numbers from that, both with and without
> > SCHEDSTATS, but then when I tried to repeat the test on a different day,
> > the results were surprisingly different, with different conclusions.
> >
> > So any advice on measuring scheduler performance would be appreciated...
>
> Yeah, its a bit of a pain in general...
>
> A) perf stat --null --repeat 50 -- perf bench sched messaging -g 50 -l 5000 | grep "seconds time elapsed"
> B) perf stat --null --repeat 50 -- taskset 1 perf bench sched pipe | grep "seconds time elapsed"
>
> 1) tip/master + 1-4
> 2) tip/master + 1-5
> 3) tip/master + 1-5 + below
>
> 1 2 3
>
> A) 4.627767855 4.650429917 4.646208062
> 4.633921933 4.641424424 4.612021058
> 4.649536375 4.663144144 4.636815948
> 4.630165619 4.649053552 4.613022902
>
> B) 1.770732957 1.789534273 1.773334291
> 1.761740716 1.795618428 1.773338681
> 1.763761666 1.822316496 1.774385589
>
>
> From this it looks like patch 5 does hurt a wee bit, but we can get most
> of that back by reordering the structure a bit. The results seem
> 'stable' across rebuilds and reboots (I've pop'ed all patches and
> rebuild, rebooted and re-benched 1 at the end and obtained similar
> results).
>
> Although, possible that if we reorder first and then do 5, we'll just
> see a bigger regression. I've not bothered.

Thanks a lot for benchmarking this! And also for improving the cache
alignments. Your changes look good to me.

--
Josh