Re: [Regression] ext4: changes to mb_optimize_scan cause issues on Raspberry Pi

From: Thorsten Leemhuis
Date: Thu Jul 28 2022 - 03:38:10 EST




On 26.07.22 17:54, Stefan Wahren wrote:
> Hi Ojaswin,
>
> Am 26.07.22 um 08:43 schrieb Ojaswin Mujoo:
>> On Mon, Jul 25, 2022 at 09:09:32PM +0200, Stefan Wahren wrote:
>>> Hi Ojaswin,
>>>
>>> Am 25.07.22 um 17:07 schrieb Ojaswin Mujoo:
>>>> On Mon, Jul 18, 2022 at 03:29:47PM +0200, Stefan Wahren wrote:
>>>>> Hi,
>>>>>
>>>>> i noticed that since Linux 5.18 (Linux 5.19-rc6 is still affected) i'm
>>>>> unable to run "rpi-update" without massive performance regression
>>>>> on my
>>>>> Raspberry Pi 4 (multi_v7_defconfig + CONFIG_ARM_LPAE). Using Linux
>>>>> 5.17 this
>>>>> tool successfully downloads the latest firmware (> 100 MB) on my
>>>>> development
>>>>> micro SD card (Kingston 16 GB Industrial) with a ext4 filesystem
>>>>> within ~ 1
>>>>> min. The same scenario on Linux 5.18 shows the following symptoms:
>>>>>
>>>>> - download takes endlessly much time and leads to an abort by
>>>>> userspace in
>>>>> most cases because of the poor performance
>>>>> - massive system load during download even after download has been
>>>>> aborted
>>>>> (heartbeat LED goes wild)
>>>>> - whole system becomes nearly unresponsive
>>>>> - system load goes back to normal after > 10 min
>>>>> - dmesg doesn't show anything suspicious
>>>>>
>>>>> I was able to bisect this issue:
>>>>>
>>>>> ff042f4a9b050895a42cae893cc01fa2ca81b95c good
>>>>> 4b0986a3613c92f4ec1bdc7f60ec66fea135991f bad
>>>>> 25fd2d41b505d0640bdfe67aa77c549de2d3c18a bad
>>>>> b4bc93bd76d4da32600795cd323c971f00a2e788 bad
>>>>> 3fe2f7446f1e029b220f7f650df6d138f91651f2 bad
>>>>> b080cee72ef355669cbc52ff55dc513d37433600 good
>>>>> ad9c6ee642a61adae93dfa35582b5af16dc5173a good
>>>>> 9b03992f0c88baef524842e411fbdc147780dd5d bad
>>>>> aab4ed5816acc0af8cce2680880419cd64982b1d good
>>>>> 14705fda8f6273501930dfe1d679ad4bec209f52 good
>>>>> 5c93e8ecd5bd3bfdee013b6da0850357eb6ca4d8 good
>>>>> 8cb5a30372ef5cf2b1d258fce1711d80f834740a bad
>>>>> 077d0c2c78df6f7260cdd015a991327efa44d8ad bad
>>>>> cc5095747edfb054ca2068d01af20be3fcc3634f good
>>>>> 27b38686a3bb601db48901dbc4e2fc5d77ffa2c1 good
>>>>>
>>>>> commit 077d0c2c78df6f7260cdd015a991327efa44d8ad
>>>>> Author: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx>
>>>>> Date:   Tue Mar 8 15:22:01 2022 +0530
>>>>>
>>>>> ext4: make mb_optimize_scan performance mount option work with extents
>>>>>
>>>>> If i revert this commit with Linux 5.19-rc6 the performance regression
>>>>> disappears.
>>>>>
>>>>> Please ask if you need more information.
>>>> Hi Stefan,
>>>>
>>>> Apologies, I had missed this email initially. So this particular patch
>>>> simply changed a typo in an if condition which was preventing the
>>>> mb_optimize_scan option to be enabled correctly (This feature was
>>>> introduced in the following commit [1]). I think with the
>>>> mb_optimize_scan now working, it is somehow causing the firmware
>>>> download/update to take a longer time.
>>>>
>>>> I'll try to investigate this and get back with my findings.
>>> thanks. I wasn't able to reproduce this heavy load symptoms with
>>> every SD
>>> card. Maybe this depends on the write performance of the SD card to
>>> trigger
>>> the situation (used command to measure write performance: dd
>>> if=/dev/zero
>>> of=/boot/test bs=1M count=30 oflag=dsync,direct ).
>>>
>>> I tested a Kingston consumer 32 GB which had nearly constant write
>>> performance of 13 MB/s and didn't had the heavy load symptoms. The
>>> firmware
>>> update was done in a few seconds, so hard to say that at least the
>>> performance regression is reproducible.
>>>
>>> I also tested 2x Kingston industrial 16 GB which had a floating write
>>> performance between 5 and 10 MB/s (wear leveling?) and both had the
>>> heavy
>>> load symptoms.
>>>
>>> All SD cards has been detected as ultra high speed DDR50 by the emmc2
>>> interface.
>>>
>>> Best regards
>>>
>>>> Regard,
>>>> Ojaswin
>>>>
>>>> [1]
>>>>     commit 196e402adf2e4cd66f101923409f1970ec5f1af3
>>>>     From: Harshad Shirwadkar <harshadshirwadkar@xxxxxxxxx>
>>>>     Date: Thu, 1 Apr 2021 10:21:27 -0700
>>>>     
>>>>     ext4: improve cr 0 / cr 1 group scanning
>>>>
>>>>> Regards
>>>>>
>> Thanks for the info Stefan, I'm still trying to reproduce the issue but
>> it's slightly challenging since I don't have my RPi handy at the moment.
>>
>> In the meantime, would you please try out the mb_optmize_scan=0 command
>> line options to see if that helps bypass the issue. This will help
>> confirm if the issue lies in mb_optmize_scan itself or if its something
>> else.
>>
> I run the firmware update 5 times with mb_optimize_scan=0 on my
> Raspberry Pi 4 and the industrial SD card and everytime the update worked.
>>

[CCing Jan]

FYI, Jan yesterday reported benchmark regresses that might or might not
be related Stefan's regression on the Raspberry Pi:
https://lore.kernel.org/all/20220727105123.ckwrhbilzrxqpt24@quack3/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

#regzbot monitor
https://lore.kernel.org/all/20220727105123.ckwrhbilzrxqpt24@quack3/