Re: 6.5.0rc5 fs hang - ext4? raid?

From: Dr. David Alan Gilbert
Date: Wed Aug 16 2023 - 10:45:42 EST


* Jens Axboe (axboe@xxxxxxxxx) wrote:
> On 8/15/23 7:31 PM, Dr. David Alan Gilbert wrote:
> > (Copying in Christoph and Jens)
> >
> > * Dr. David Alan Gilbert (dave@xxxxxxxxxxx) wrote:
> >> * Dr. David Alan Gilbert (dave@xxxxxxxxxxx) wrote:
> >>> * Theodore Ts'o (tytso@xxxxxxx) wrote:
> >>>> On Mon, Aug 14, 2023 at 09:02:53PM +0000, Dr. David Alan Gilbert wrote:
> >>>>> dg 29594 29592 0 18:40 pts/0 00:00:00 /usr/bin/ar --plugin /usr/libexec/gcc/x86_64-redhat-linux/13/liblto_plugin.so -csrDT src/intel/perf/libintel_perf.a src/intel/perf/libintel_perf.a.p/meson-generated_.._intel_perf_metrics.c.o src/intel/perf/libintel_perf.a.p/intel_perf.c.o src/intel/perf/libintel_perf.a.p/intel_perf_query.c.o src/intel/perf/libintel_perf.a.p/intel_perf_mdapi.c.o
> >>>>>
> >>>>> [root@dalek dg]# cat /proc/29594/stack
> >>>>> [<0>] md_super_wait+0xa2/0xe0
> >>>>> [<0>] md_bitmap_unplug+0xd2/0x120
> >>>>> [<0>] flush_bio_list+0xf3/0x100 [raid1]
> >>>>> [<0>] raid1_unplug+0x3b/0xb0 [raid1]
> >>>>> [<0>] __blk_flush_plug+0xd7/0x150
> >>>>> [<0>] blk_finish_plug+0x29/0x40
> >>>>> [<0>] ext4_do_writepages+0x401/0xc90
> >>>>> [<0>] ext4_writepages+0xad/0x180
> >>>>
> >>>> If you want a few seconds and try grabbing cat /proc/29594/stack
> >>>> again, what does the stack trace stay consistent as above?
> >>>
> >>> I'll get back to that and retry it.
> >>
> >> Yeh, the stack is consistent; this time around it's an 'ar' in a kernel
> >> build:
> >>
> >> [root@dalek dg]# cat /proc/17970/stack
> >> [<0>] md_super_wait+0xa2/0xe0
> >> [<0>] md_bitmap_unplug+0xad/0x120
> >> [<0>] flush_bio_list+0xf3/0x100 [raid1]
> >> [<0>] raid1_unplug+0x3b/0xb0 [raid1]
> >> [<0>] __blk_flush_plug+0xd7/0x150
> >> [<0>] blk_finish_plug+0x29/0x40
> >> [<0>] ext4_do_writepages+0x401/0xc90
> >> [<0>] ext4_writepages+0xad/0x180
> >> [<0>] do_writepages+0xd2/0x1e0
> >> [<0>] filemap_fdatawrite_wbc+0x63/0x90
> >> [<0>] __filemap_fdatawrite_range+0x5c/0x80
> >> [<0>] ext4_release_file+0x74/0xb0
> >> [<0>] __fput+0xf5/0x2a0
> >> [<0>] task_work_run+0x5d/0x90
> >> [<0>] exit_to_user_mode_prepare+0x1e6/0x1f0
> >> [<0>] syscall_exit_to_user_mode+0x1b/0x40
> >> [<0>] do_syscall_64+0x6c/0x90
> >> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> >> [root@dalek dg]# cat /proc/17970/stack
> >> [<0>] md_super_wait+0xa2/0xe0
> >> [<0>] md_bitmap_unplug+0xad/0x120
> >> [<0>] flush_bio_list+0xf3/0x100 [raid1]
> >> [<0>] raid1_unplug+0x3b/0xb0 [raid1]
> >> [<0>] __blk_flush_plug+0xd7/0x150
> >> [<0>] blk_finish_plug+0x29/0x40
> >> [<0>] ext4_do_writepages+0x401/0xc90
> >> [<0>] ext4_writepages+0xad/0x180
> >> [<0>] do_writepages+0xd2/0x1e0
> >> [<0>] filemap_fdatawrite_wbc+0x63/0x90
> >> [<0>] __filemap_fdatawrite_range+0x5c/0x80
> >> [<0>] ext4_release_file+0x74/0xb0
> >> [<0>] __fput+0xf5/0x2a0
> >> [<0>] task_work_run+0x5d/0x90
> >> [<0>] exit_to_user_mode_prepare+0x1e6/0x1f0
> >> [<0>] syscall_exit_to_user_mode+0x1b/0x40
> >> [<0>] do_syscall_64+0x6c/0x90
> >> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> >>
> >>>> Also, if you have iostat installed (usually part of the sysstat
> >>>> package), does "iostat 1" show any I/O activity on the md device?
> >>
> >> iostat is showing something odd, most devices are at 0,
> >> except for 3 of the dm's that are stuck at 100% utilisation with
> >> apparently nothing going on:
> >>
> >> avg-cpu: %user %nice %system %iowait %steal %idle
> >> 0.06 0.00 0.03 53.06 0.00 46.84
> >>
> >> Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
> >> ...
> >> dm-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
> >> dm-17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
> >> dm-18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> >> dm-19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> >> dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> >> dm-20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
> >> ....
> >>
> >> dm-20 is the /dev/mapper/main-more which is the RAID on which the
> >> fs runs, 16 and 17 are main-more_rmeta_0 and main-more_rimage_0
> >> so something screwy is going on there.
> >
> > I've just finished a bisect of this hang, and got to:
> >
> > 615939a2ae734e3e68c816d6749d1f5f79c62ab7 is the first bad commit
> > commit 615939a2ae734e3e68c816d6749d1f5f79c62ab7
> > Author: Christoph Hellwig <hch@xxxxxx>
> > Date: Fri May 19 06:40:48 2023 +0200
> >
> > blk-mq: defer to the normal submission path for post-flush requests
> >
> > Requests with the FUA bit on hardware without FUA support need a post
> > flush before returning to the caller, but they can still be sent using
> > the normal I/O path after initializing the flush-related fields and
> > end I/O handler.
> >
> > Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> > Reviewed-by: Bart Van Assche <bvanassche@xxxxxxx>
> > Link: https://lore.kernel.org/r/20230519044050.107790-6-hch@xxxxxx
> > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>
> Can you try and pull in:
>
> https://git.kernel.dk/cgit/linux/commit/?h=block-6.5&id=5ff3213a5387e076af2b87f796f94b36965e8c3a
>
> and see if that helps?

<testing....>

Dave

> --
> Jens Axboe
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/