Tasks stuck jbd2 for a long time

From: Bhatnagar, Rishabh
Date: Tue Aug 15 2023 - 15:03:27 EST


Hi Jan/Ted

We are seeing lockups in journaling EXT4 code (5.10 - 6.1) under heavy load. The stack traces seem to suggest that kjournald thread is blocked for a long time.
Kjournald thread seem to be waiting on writeback thread to decrement t_updates and other writeback threads seem to be waiting on kjournald to flush the current transaction.
The system completely hangs in this case and the system IO drops to zero after sometime.

This is a RAID0 setup with 4 nvme (7TB each) disks. There is 390GB RAM available. The issue occurs when user starts downloading a big enough data set (60-70% disk capacity).
This is observed on 5.10 kernels (5.10.184). We tried moving to 6.1 kernels and saw similar issue. The system completely freezes and we see these stack traces in serial console.

We have tried experimenting with dirty_ratio, dirty_background_ratio, noatime/lazytime updates but don't see much improvement.
One thing that helps is disabling journaling completely. Testing is ongoing after increasing the journal size. (current size 128MB).
We are trying to understand why journal threads are stuck for such a long time. This causes the entire IO stall in the system. Let us know if you have seen something similar before and if there are any suggestions that we can try.

INFO: task kworker/u145:1:376 blocked for more than 120 seconds.
      Not tainted 5.10.184-175.731.amzn2.x86_64
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u145:1  state:D stack :    0 pid:  376 ppid:     2 flags:0x00004000
Workqueue : writeback wb_workfn (flush-9:0)
Call Trace:
 __schedule+0x1f9/0x660
 schedule+0x46/0xb0
 wait_transaction_locked+0x8a/0xd0 [jbd2]
 ? add_wait_queue_exclusive+0x70/0x70
 add_transaction_credits+0xd6/0x2a0 [jbd2]
 ? blk_mq_flush_plug_list+0x100/0x1a0
 start_this_handle+0x12d/0x4d0 [jbd2]
 ? jbd2__journal_start+0xc3/0x1e0 [jbd2]
 ? kmem_cache_alloc+0x132/0x270
 jbd2__journal_start+0xfb/0x1e0 [jbd2]
 __ext4_journal_start_sb+0xfb/0x110 [ext4]
 ext4_writepages+0x32c/0x790 [ext4]
 do_writepages+0x34/0xc0
 ? write_inode+0x54/0xd0
 __writeback_single_inode+0x39/0x200
 writeback_sb_inodes+0x20d/0x4a0
 __writeback_inodes_wb+0x4c/0xe0
 wb_writeback+0x1d8/0x2a0
 wb_do_writeback+0x166/0x180
 wb_workfn+0x6e/0x250
 ? __switch_to_asm+0x3a/0x60
 ? __schedule+0x201/0x660
 process_one_work+0x1b0/0x350
 worker_thread+0x49/0x310
 ? process_one_work+0x350/0x350
 kthread+0x11b/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30

INFO: task jbd2/md0-8:8068 blocked for more than 120 seconds.

      Not tainted 5.10.184-175.731.amzn2.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:jbd2/md0-8      state:D stack:    0 pid: 8068 ppid:     2 flags:0x00004080
Call Trace:
__schedule+0x1f9/0x660
 schedule+0x46/0xb0
 jbd2_journal_commit_transaction+0x35d/0x1880 [jbd2]
 ? update_load_avg+0x7a/0x5d0
 ? add_wait_queue_exclusive+0x70/0x70
 ? lock_timer_base+0x61/0x80
 ? kjournald2+0xcf/0x360 [jbd2]
 kjournald2+0xcf/0x360 [jbd2]
 ? add_wait_queue_exclusive+0x70/0x70
 ? load_superblock.part.0+0xb0/0xb0 [jbd2]
 kthread+0x11b/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30

INFO: task kvstore-leaf:39161 blocked for more than 121 seconds.
      Not tainted 5.10.184-175.731.amzn2.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kvstore-leaf    state:D stack:    0 pid:39161 ppid: 39046 flags:0x00000080
Call Trace:
 __schedule+0x1f9/0x660
 schedule+0x46/0xb0
 wait_transaction_locked+0x8a/0xd0 [jbd2]
 ? add_wait_queue_exclusive+0x70/0x70
 add_transaction_credits+0xd6/0x2a0 [jbd2]
 start_this_handle+0x12d/0x4d0 [jbd2]
 ? jbd2__journal_start+0x91/0x1e0 [jbd2]
 ? kmem_cache_alloc+0x132/0x270
 jbd2__journal_start+0xfb/0x1e0 [jbd2]
 __ext4_journal_start_sb+0xfb/0x110 [ext4]
 ext4_dirty_inode+0x3d/0x90 [ext4]
 __mark_inode_dirty+0x196/0x300
 generic_update_time+0x68/0xd0
 file_update_time+0x127/0x140
 ? generic_write_checks+0x61/0xd0
 ext4_buffered_write_iter+0x52/0x160 [ext4]
 new_sync_write+0x11c/0x1b0
 vfs_write+0x1c9/0x260
 ksys_write+0x5f/0xe0
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x61/0xc6


Thanks
Rishabh