Fwd: fdatasync to a block device seems to block writes on unrelated devices

From: Bagas Sanjaya
Date: Mon Nov 20 2023 - 04:04:52 EST


Hi,

I notice a bug report on Bugzilla [1]. Quoting from it:

> I was running nwipe on a failing hard drive that was running very slow and while nwipe was running fdatasync it seemed to cause delays for the filesystems on the other drives. The other drives are attached to the same onboard ahci sata adapter if that is important. After stopping nwipe, performance returned to normal
>
> The system is using ext4 filesystems on top of LVM on top of Linux RAID6 and the kernel is 6.1.53.
>
> Is this a design problem with fdatasync or could it be something else?
>
> Nov 18 08:10:27 server kernel: sysrq: Show Blocked State
> Nov 18 08:10:27 server kernel: task:nwipe state:D stack:0 pid:61181 ppid:42337 flags:0x00004000
> Nov 18 08:10:27 server kernel: Call Trace:
> Nov 18 08:10:27 server kernel: <TASK>
> Nov 18 08:10:27 server kernel: __schedule+0x2f8/0x870
> Nov 18 08:10:27 server kernel: schedule+0x55/0xc0
> Nov 18 08:10:27 server kernel: io_schedule+0x3d/0x60
> Nov 18 08:10:27 server kernel: folio_wait_bit_common+0x12c/0x300
> Nov 18 08:10:27 server kernel: ? filemap_invalidate_unlock_two+0x30/0x30
> Nov 18 08:10:27 server kernel: write_cache_pages+0x1c6/0x460
> Nov 18 08:10:27 server kernel: ? dirty_background_bytes_handler+0x20/0x20
> Nov 18 08:10:27 server kernel: generic_writepages+0x76/0xa0
> Nov 18 08:10:27 server kernel: do_writepages+0xbb/0x1c0
> Nov 18 08:10:27 server kernel: filemap_fdatawrite_wbc+0x56/0x80
> Nov 18 08:10:27 server kernel: __filemap_fdatawrite_range+0x53/0x70
> Nov 18 08:10:27 server kernel: file_write_and_wait_range+0x3c/0x90
> Nov 18 08:10:27 server kernel: blkdev_fsync+0xe/0x30
> Nov 18 08:10:27 server kernel: __x64_sys_fdatasync+0x46/0x80
> Nov 18 08:10:27 server kernel: do_syscall_64+0x3a/0xb0
> Nov 18 08:10:27 server kernel: entry_SYSCALL_64_after_hwframe+0x5e/0xc8
> Nov 18 08:10:27 server kernel: RIP: 0033:0x7f02a735f00b
> Nov 18 08:10:27 server kernel: RSP: 002b:00007f02a6858c80 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
> Nov 18 08:10:27 server kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f02a735f00b
> Nov 18 08:10:27 server kernel: RDX: 0000000000000002 RSI: 00007f02a6858d80 RDI: 0000000000000004
> Nov 18 08:10:27 server kernel: RBP: 00000118badb6000 R08: 0000000000000000 R09: 00007f02a0000080
> Nov 18 08:10:27 server kernel: R10: 0000000000001000 R11: 0000000000000293 R12: 00000000000186a0
> Nov 18 08:10:27 server kernel: R13: 00000000000186a0 R14: 0000000000001000 R15: 000055b7a0775850
> Nov 18 08:10:27 server kernel: </TASK>
> Nov 18 08:10:27 server kernel: task:kworker/u64:4 state:D stack:0 pid:7842 ppid:2 flags:0x00004000
> Nov 18 08:10:27 server kernel: Workqueue: writeback wb_workfn (flush-8:0)
> Nov 18 08:10:27 server kernel: Call Trace:
> Nov 18 08:10:27 server kernel: <TASK>
> Nov 18 08:10:27 server kernel: __schedule+0x2f8/0x870
> Nov 18 08:10:27 server kernel: schedule+0x55/0xc0
> Nov 18 08:10:27 server kernel: io_schedule+0x3d/0x60
> Nov 18 08:10:27 server kernel: blk_mq_get_tag+0x115/0x2a0
> Nov 18 08:10:27 server kernel: ? destroy_sched_domains_rcu+0x20/0x20
> Nov 18 08:10:27 server kernel: __blk_mq_alloc_requests+0x18c/0x2e0
> Nov 18 08:10:27 server kernel: blk_mq_submit_bio+0x3dc/0x590
> Nov 18 08:10:27 server kernel: __submit_bio+0xec/0x170
> Nov 18 08:10:27 server kernel: submit_bio_noacct_nocheck+0x2bd/0x2f0
> Nov 18 08:10:27 server kernel: ? submit_bio_noacct+0x68/0x440
> Nov 18 08:10:27 server kernel: __block_write_full_page+0x1ef/0x4c0
> Nov 18 08:10:27 server kernel: ? bh_uptodate_or_lock+0x70/0x70
> Nov 18 08:10:27 server kernel: ? blkdev_write_begin+0x20/0x20
> Nov 18 08:10:27 server kernel: __writepage+0x14/0x60
> Nov 18 08:10:27 server kernel: write_cache_pages+0x172/0x460
> Nov 18 08:10:27 server kernel: ? dirty_background_bytes_handler+0x20/0x20
> Nov 18 08:10:27 server kernel: generic_writepages+0x76/0xa0
> Nov 18 08:10:27 server kernel: do_writepages+0xbb/0x1c0
> Nov 18 08:10:27 server kernel: ? __wb_calc_thresh+0x46/0x130
> Nov 18 08:10:27 server kernel: __writeback_single_inode+0x30/0x1a0
> Nov 18 08:10:27 server kernel: writeback_sb_inodes+0x205/0x4a0
> Nov 18 08:10:27 server kernel: __writeback_inodes_wb+0x47/0xe0
> Nov 18 08:10:27 server kernel: wb_writeback.isra.0+0x189/0x1d0
> Nov 18 08:10:27 server kernel: wb_workfn+0x1d0/0x3a0
> Nov 18 08:10:27 server kernel: process_one_work+0x1e5/0x320
> Nov 18 08:10:27 server kernel: worker_thread+0x45/0x3a0
> Nov 18 08:10:27 server kernel: ? rescuer_thread+0x390/0x390
> Nov 18 08:10:27 server kernel: kthread+0xd5/0x100
> Nov 18 08:10:27 server kernel: ? kthread_complete_and_exit+0x20/0x20
> Nov 18 08:10:27 server kernel: ret_from_fork+0x22/0x30
> Nov 18 08:10:27 server kernel: </TASK>
> Nov 18 08:10:27 server kernel: task:rm state:D stack:0 pid:54615 ppid:54597 flags:0x00004000
> Nov 18 08:10:27 server kernel: Call Trace:
> Nov 18 08:10:27 server kernel: <TASK>
> Nov 18 08:10:27 server kernel: __schedule+0x2f8/0x870
> Nov 18 08:10:27 server kernel: schedule+0x55/0xc0
> Nov 18 08:10:27 server kernel: io_schedule+0x3d/0x60
> Nov 18 08:10:27 server kernel: bit_wait_io+0x8/0x50
> Nov 18 08:10:27 server kernel: __wait_on_bit+0x46/0x100
> Nov 18 08:10:27 server kernel: ? bit_wait+0x50/0x50
> Nov 18 08:10:27 server kernel: out_of_line_wait_on_bit+0x8c/0xb0
> Nov 18 08:10:27 server kernel: ? sugov_start+0x140/0x140
> Nov 18 08:10:27 server kernel: ext4_read_bh+0x6e/0x80
> Nov 18 08:10:27 server kernel: ext4_bread+0x45/0x60
> Nov 18 08:10:27 server kernel: __ext4_read_dirblock+0x4d/0x330
> Nov 18 08:10:27 server kernel: htree_dirblock_to_tree+0xa7/0x370
> Nov 18 08:10:27 server kernel: ? path_lookupat+0x92/0x190
> Nov 18 08:10:27 server kernel: ? filename_lookup+0xdf/0x1e0
> Nov 18 08:10:27 server kernel: ext4_htree_fill_tree+0x108/0x3c0
> Nov 18 08:10:27 server kernel: ext4_readdir+0x725/0xb40
> Nov 18 08:10:27 server kernel: iterate_dir+0x16a/0x1b0
> Nov 18 08:10:27 server kernel: __x64_sys_getdents64+0x7f/0x120
> Nov 18 08:10:27 server kernel: ? compat_filldir+0x180/0x180
> Nov 18 08:10:27 server kernel: do_syscall_64+0x3a/0xb0
> Nov 18 08:10:27 server kernel: entry_SYSCALL_64_after_hwframe+0x5e/0xc8
> Nov 18 08:10:27 server kernel: RIP: 0033:0x7f8e32834897
> Nov 18 08:10:27 server kernel: RSP: 002b:00007fffa3fb78c8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
> Nov 18 08:10:27 server kernel: RAX: ffffffffffffffda RBX: 0000558d8c4f8a70 RCX: 00007f8e32834897
> Nov 18 08:10:27 server kernel: RDX: 0000000000008000 RSI: 0000558d8c4f8aa0 RDI: 0000000000000004
> Nov 18 08:10:27 server kernel: RBP: 0000558d8c4f8aa0 R08: 0000000000000030 R09: 00007f8e3292da60
> Nov 18 08:10:27 server kernel: R10: 00007f8e3292e140 R11: 0000000000000293 R12: ffffffffffffff88
> Nov 18 08:10:27 server kernel: R13: 0000558d8c4f8a74 R14: 0000000000000000 R15: 0000558d8c503c78
> Nov 18 08:10:27 server kernel: </TASK>

And then ...

> Also, I was running badblocks -b 4096 -w -s -v on the failing hard drive for a few days before trying nwipe and it didn't seem to be causing slowdowns on the server and the man page for badblocks says it uses Direct I/O by default. I decided to try nwipe as it provides the option disable read verifying.
>
> I could probably try removing fdatasync from nwipe or modifying it to use Direct I/O, but I haven't done that yet.

See Bugzilla for the full thread.

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=218158

--
An old man doll... just what I always wanted! - Clara