Re: [PATCH v4 11/11] sched/fair: rework find_idlest_group

From: Vincent Guittot
Date: Wed Nov 20 2019 - 08:22:23 EST


Hi Qais,

On Wed, 20 Nov 2019 at 12:58, Qais Yousef <qais.yousef@xxxxxxx> wrote:
>
> Hi Vincent
>
> On 10/18/19 15:26, Vincent Guittot wrote:
> > The slow wake up path computes per sched_group statisics to select the
> > idlest group, which is quite similar to what load_balance() is doing
> > for selecting busiest group. Rework find_idlest_group() to classify the
> > sched_group and select the idlest one following the same steps as
> > load_balance().
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > ---
>
> LTP test has caught a regression in perf_event_open02 test on linux-next and I
> bisected it to this patch.
>
> That is checking out next-20191119 tag and reverting this patch on top the test
> passes. Without the revert the test fails.
>
> I think this patch disturbs this part of the test:
>
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/perf_event_open/perf_event_open02.c#L209
>
> When I revert this patch count_hardware_counters() returns a non zero value.
> But with it applied it returns 0 which indicates that the condition terminates
> earlier than what the test expects.

Thanks for the report and starting analysing it

>
> I'm failing to see the connection yet, but since I spent enough time bisecting
> it I thought I'll throw this out before I continue to bottom it out in hope it
> rings a bell for you or someone else.

I will try to reproduce the problem and understand why it's failing
because i don't have any clue of the relation between both for now

>
> The problem was consistently reproducible on Juno-r2.
>
> LTP was compiled from 20190930 tag using
>
> ./configure --host=aarch64-linux-gnu --prefix=~/arm64-ltp/
> make && make install
>
>
>
> *** Output of the test when it fails ***
>
> # ./perf_event_open02 -v
> at iteration:0 value:254410384 time_enabled:195570320 time_running:156044100
> perf_event_open02 0 TINFO : overall task clock: 166935520
> perf_event_open02 0 TINFO : hw sum: 1200812256, task clock sum: 667703360
> hw counters: 300202518 300202881 300203246 300203611
> task clock counters: 166927400 166926780 166925660 166923520
> perf_event_open02 0 TINFO : ratio: 3.999768
> perf_event_open02 0 TINFO : nhw: 0.000100 /* I added this extra line for debug */
> perf_event_open02 1 TFAIL : perf_event_open02.c:370: test failed (ratio was greater than )
>
>
>
> *** Output of the test when it passes (this patch reverted) ***
>
> # ./perf_event_open02 -v
> at iteration:0 value:300271482 time_enabled:177756080 time_running:177756080
> at iteration:1 value:300252655 time_enabled:166939100 time_running:166939100
> at iteration:2 value:300252877 time_enabled:166924920 time_running:166924920
> at iteration:3 value:300242545 time_enabled:166909620 time_running:166909620
> at iteration:4 value:300250779 time_enabled:166918540 time_running:166918540
> at iteration:5 value:300250660 time_enabled:166922180 time_running:166922180
> at iteration:6 value:258369655 time_enabled:167388920 time_running:143996600
> perf_event_open02 0 TINFO : overall task clock: 167540640
> perf_event_open02 0 TINFO : hw sum: 1801473873, task clock sum: 1005046160
> hw counters: 177971955 185132938 185488818 185488199 185480943 185477118 179657001 172499668 172137672 172139561
> task clock counters: 99299900 103293440 103503840 103502040 103499020 103496160 100224320 96227620 95999400 96000420
> perf_event_open02 0 TINFO : ratio: 5.998820
> perf_event_open02 0 TINFO : nhw: 6.000100 /* I added this extra line for debug */
> perf_event_open02 1 TPASS : test passed
>
> Thanks
>
> --
> Qais Yousef