Re: [PATCH v2] sched/fair: check for idle core

From: Julia Lawall
Date: Sat Feb 06 2021 - 12:21:59 EST




On Mon, 25 Jan 2021, Vincent Guittot wrote:

> On Mon, 25 Jan 2021 at 10:20, Julia Lawall <julia.lawall@xxxxxxxx> wrote:
> >
> >
> >
> > On Mon, 25 Jan 2021, Mel Gorman wrote:
> >
> > > On Sun, Jan 24, 2021 at 09:38:14PM +0100, Julia Lawall wrote:
> > > >
> > > >
> > > > On Tue, 27 Oct 2020, Mel Gorman wrote:
> > > >
> > > > > On Thu, Oct 22, 2020 at 03:15:50PM +0200, Julia Lawall wrote:
> > > > > > Fixes: 11f10e5420f6 ("sched/fair: Use load instead of runnable load in wakeup path")
> > > > > > Signed-off-by: Julia Lawall <Julia.Lawall@xxxxxxxx>
> > > > > > Reviewed-by Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > > > > >
> > > > >
> > > > > While not a universal win, it was mostly a win or neutral. In few cases
> > > > > where there was a problem, one benchmark I'm a bit suspicious of generally
> > > > > as occasionally it generates bad results for unknown and unpredictable
> > > > > reasons. In another, it was very machine specific and the differences
> > > > > were small in absolte time rather than relative time. Other tests on the
> > > > > same machine were fine so overall;
> > > > >
> > > > > Acked-by: Mel Gorman <mgorman@xxxxxxx>
> > > >
> > > > Recently, we have been testing the phoronix multicore benchmarks. On v5.9
> > > > with this patch, the preparation time of phoronix slows down, from ~23
> > > > seconds to ~28 seconds. In v5.11-rc4, we see 29 seconds. It's not yet
> > > > clear what causes the problem. But perhaps the patch should be removed
> > > > from v5.11, until the problem is understood.
> > > >
> > > > commit d8fcb81f1acf651a0e50eacecca43d0524984f87
> > > >
> > >
> > > I'm not 100% convinved given that it was a mix of wins and losses. In
> > > the wakup path in general, universal wins almost never happen. It's not
> > > 100% clear from your mail what happens during the preparation patch. If
> > > it included time to download the benchmarks and install then it would be
> > > inherently variable due to network time (if download) or cache hotness
> > > (if installing/compiling). While preparation time can be interesting --
> > > for example, if preparation involves reading a lot of files from disk,
> > > it's not universally interesting when it's not the critical phase of a
> > > benchmark.
> >
> > The benchmark is completely downloaded prior to the runs. There seems to
> > be some perturbation to the activation of containerd. Normally it is
> > even: * * * *
>
> Does it impact the benchmark results too or only the preparation prior
> to running the benchmark ?
>
> >
> > and with the patch it becomes more like: * ** **
> >
> > That is every other one is on time, and every other one is late.
> >
> > But I don't know why this happens.
> >
> > julia
> >
> > >
> > > I think it would be better to wait until the problem is fully understood
> > > to see if it's a timing artifact (e.g. a race between when prev_cpu is
> > > observed to be idle and when it is busy).
>
> I agree that a better understanding of what is happening is necessary
> before any changes

The tests were incorrect. The faster ones without the patch were with
schedutil. If we use powersave with the patch or without we get the same
setup time and comparable values for the metrics for the actual benchmarks
(some of which vary a lot, though).

So there is no evidence of any problem with the patch.

julia