Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

From: Valentin Schneider
Date: Fri Dec 03 2021 - 07:03:43 EST


On 30/11/21 00:26, Valentin Schneider wrote:
> On 29/11/21 14:49, Josef Bacik wrote:
>> On Mon, Nov 29, 2021 at 06:31:17PM +0000, Valentin Schneider wrote:
>>> On 29/11/21 13:15, Josef Bacik wrote:
>>> > On Mon, Nov 29, 2021 at 06:03:24PM +0000, Valentin Schneider wrote:
>>> >> Would you happen to have execution traces by any chance? If not I should be
>>> >> able to get one out of that fsperf thingie.
>>> >>
>>> >
>>> > I don't, if you want to tell me how I can do it right now. I've disabled
>>> > everything on this box for now so it's literally just sitting there waiting to
>>> > have things done to it. Thanks,
>>> >
>>>
>>> I see you have Ftrace enabled in your config, so that ought to do it:
>>>
>>> trace-cmd record -e 'sched:*' -e 'cpu_idle' $your_test_cmd
>>>
>>
>> http://toxicpanda.com/performance/trace.dat
>>
>> it's like 16mib. Enjoy,
>>
>
> Neat, thanks!
>
> Runqueue depth seems to be very rarely greater than 1, tasks with ~1ms
> runtime and lots of sleeping (also bursty kworker activity with activations
> of tens of µs), and some cores (Internet tells me that Xeon Bronze 3204
> doesn't have SMT) spend most of their time idling. Not the most apocalyptic
> task placement vs ILB selection, but the task activation patterns roughly
> look like what I was thinking of - there might be hope for me yet.
>
> I'll continue the headscratching after tomorrow's round of thinking juice.
>

Could you give the 4 top patches, i.e. those above
8c92606ab810 ("sched/cpuacct: Make user/system times in cpuacct.stat more precise")
a try?

https://git.gitlab.arm.com/linux-arm/linux-vs.git -b mainline/sched/nohz-next-update-regression

I gave that a quick test on the platform that caused me to write the patch
you bisected and looks like it didn't break the original fix. If the above
counter-measures aren't sufficient, I'll have to go poke at your
reproducers...

>> Josef