Re: stalling IO regression since linux 5.12, through 5.18

From: Chris Murphy
Date: Wed Aug 17 2022 - 11:04:17 EST




On Wed, Aug 17, 2022, at 10:53 AM, Ming Lei wrote:
> On Wed, Aug 17, 2022 at 10:34:38AM -0400, Chris Murphy wrote:
>>
>>
>> On Wed, Aug 17, 2022, at 8:06 AM, Ming Lei wrote:
>>
>> > blk-mq debugfs log is usually helpful for io stall issue, care to post
>> > the blk-mq debugfs log:
>> >
>> > (cd /sys/kernel/debug/block/$disk && find . -type f -exec grep -aH . {} \;)
>>
>> This is only sda
>> https://drive.google.com/file/d/1aAld-kXb3RUiv_ShAvD_AGAFDRS03Lr0/view?usp=sharing
>
> From the log, there isn't any in-flight IO request.
>
> So please confirm that it is collected after the IO stall is triggered.

Yes, iotop reports no reads or writes at the time of collection. IO pressure 99% for auditd, systemd-journald, rsyslogd, and postgresql, with increasing pressure from all the qemu processes.

Keep in mind this is a raid10, so maybe it's enough for just one block device IO to stall and the whole thing stops? That's why I included all block devices.

> If yes, the issue may not be related with BFQ, and should be related
> with blk-cgroup code.

Problem happens with cgroup.disable=io, does this setting affect blk-cgroup?

--
Chris Murphy