Re: [PATCH v4 00/10] sched/fair: rework the CFS load balance

From: Mel Gorman
Date: Wed Oct 30 2019 - 12:24:48 EST


On Mon, Oct 21, 2019 at 09:50:38AM +0200, Ingo Molnar wrote:
> > <SNIP>
>
> Thanks, that's an excellent series!
>

Agreed despite the level of whining and complaining I made during the
review.

> I've queued it up in sched/core with a handful of readability edits to
> comments and changelogs.
>
> There are some upstreaming caveats though, I expect this series to be a
> performance regression magnet:
>
> - load_balance() and wake-up changes invariably are such: some workloads
> only work/scale well by accident, and if we touch the logic it might
> flip over into a less advantageous scheduling pattern.
>
> - In particular the changes from balancing and waking on runnable load
> to full load that includes blocking *will* shift IO-intensive
> workloads that you tests don't fully capture I believe. You also made
> idle balancing more aggressive in essence - which might reduce cache
> locality for some workloads.
>
> A full run on Mel Gorman's magic scalability test-suite would be super
> useful ...
>

I queued this back on the 21st and it took this long for me to get back
to it.

What I tested did not include the fix for the last patch so I cannot say
the data is that useful. I also failed to include something that exercised
the IO paths in a way that idles rapidly as that can catch interesting
details (usually cpufreq related but sometimes load-balancing related).
There was no real thinking behind this decision, I just used an old
collection of tests to get a general feel for the series.

Most of the results were performance-neutral and some notable gains
(kernel compiles were 1-6% faster depending on the -j count). Hackbench
saw a disproportionate gain in terms of performance but I tend to be wary
of hackbench as improving it is rarely a universal win.
There tends to be some jitter around the point where a NUMA nodes worth
of CPUs gets overloaded. tbench (mmtests configuation network-tbench) on
a NUMA machine showed gains for low thread counts and high thread counts
but a loss near the boundary where a single node would get overloaded.

Some NAS-related workloads saw a drop in performance on NUMA machines
but the size class might be too small to be certain, I'd have to rerun
with the D class to be sure. The biggest strange drop in performance
was the elapsed time to run the git test suite (mmtests configuration
workload-shellscripts modified to use a fresh XFS partition) took 17.61%
longer to execute on a UMA Skylake machine. This *might* be due to the
missing fix because it is mostly a single-task workload.

I'm not going to go through the results in detail because I think another
full round of testing would be required to take the fix into account. I'd
also prefer to wait to see if the review results in any material change
to the series.

--
Mel Gorman
SUSE Labs