Re: [PATCH] xfs: don't do inodgc work if task is exiting

From: Dave Chinner
Date: Fri May 12 2023 - 18:54:41 EST


On Fri, May 12, 2023 at 09:15:11AM -0600, Tycho Andersen wrote:
> INFO: task java:3546393 blocked for more than 1912 seconds.
> Tainted: G OE 5.15.35netflix-g54efd87a8576 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:skyframe-evalua state:D stack: 0 pid:3546393 ppid:3532734 flags:0x00000220
> Call Trace:
> <TASK>
> __schedule+0x2c5/0x8d0
> schedule+0x3a/0xa0
> schedule_timeout+0x115/0x280
> ? xfs_buf_read_map+0x52/0x2e0 [xfs]
> ? xfs_da_read_buf+0xcf/0x120 [xfs]
> __down+0x90/0xe0
> ? down+0x43/0x60
> down+0x43/0x60
> xfs_buf_lock+0x29/0xa0 [xfs]
> xfs_buf_find.isra.34+0x1fd/0x610 [xfs]
> xfs_buf_get_map+0x4c/0x400 [xfs]
> xfs_buf_read_map+0x52/0x2e0 [xfs]
> ? xfs_read_agi+0x8c/0x120 [xfs]
> xfs_trans_read_buf_map+0x223/0x2d0 [xfs]
> ? xfs_read_agi+0x8c/0x120 [xfs]
> xfs_read_agi+0x8c/0x120 [xfs]
> xfs_ialloc_read_agi+0x2a/0x90 [xfs]
> xfs_dialloc+0x1ab/0x680 [xfs]
> ? xfs_trans_reserve_quota_icreate+0x32/0x40 [xfs]
> ? xfs_trans_alloc_icreate+0x75/0xf0 [xfs]
> xfs_create+0x39e/0x5b0 [xfs]
> xfs_generic_create+0x129/0x380 [xfs]
> ? generic_permission+0x27/0x200
> vfs_create+0x12b/0x1c0
> ovl_create_real+0xd7/0x220 [overlay]
> ovl_create_or_link+0x166/0x670 [overlay]

Lots of stuff blocked on the AGI lock doing inode allocation. Kinda
what I expected to see...

> INFO: task kworker/66:1:3548191 blocked for more than 1912 seconds.
> Tainted: G OE 5.15.35netflix-g54efd87a8576 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/66:1 state:D stack: 0 pid:3548191 ppid: 2 flags:0x00004000
> Workqueue: xfs-inodegc/nvme1n1 xfs_inodegc_worker [xfs]
> Call Trace:
> <TASK>
> __schedule+0x2c5/0x8d0
> schedule+0x3a/0xa0
> schedule_timeout+0x115/0x280
> __down+0x90/0xe0
> ? down+0x43/0x60
> down+0x43/0x60
> xfs_buf_lock+0x29/0xa0 [xfs]
> xfs_buf_find.isra.34+0x1fd/0x610 [xfs]
> xfs_buf_get_map+0x4c/0x400 [xfs]
> xfs_buf_read_map+0x52/0x2e0 [xfs]
> ? xfs_read_agi+0x8c/0x120 [xfs]
> xfs_trans_read_buf_map+0x223/0x2d0 [xfs]
> ? xfs_read_agi+0x8c/0x120 [xfs]
> xfs_read_agi+0x8c/0x120 [xfs]
> xfs_iunlink_remove+0x5a/0x220 [xfs]
> xfs_ifree+0x7a/0x4c0 [xfs]
> ? xfs_trans_alloc+0xec/0x1e0 [xfs]
> xfs_inactive_ifree+0xa1/0x1a0 [xfs]
> xfs_inactive+0xf1/0x170 [xfs]
> xfs_inodegc_worker+0x76/0x100 [xfs]
> process_one_work+0x200/0x3d0
> worker_thread+0x2d/0x3e0
> ? process_one_work+0x3d0/0x3d0
> kthread+0x11a/0x140
> ? set_kthread_struct+0x40/0x40
> ret_from_fork+0x22/0x30
> </TASK>

Yup, there's the stuck inodegc worker - also trying to get the AGI
lock doing inode unlink. I'm not surprised at all by this - working
out why this worker is stuck here is the goal because that is what
is causing all your flush problems.

It also gives me a better idea of what the likely issue is, too.

> INFO: task bazelenv:3548683 blocked for more than 1912 seconds.
> Tainted: G OE 5.15.35netflix-g54efd87a8576 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:bazelenv state:D stack: 0 pid:3548683 ppid:3435211 flags:0x00000220
> Call Trace:
> <TASK>
> __schedule+0x2c5/0x8d0
> schedule+0x3a/0xa0
> schedule_timeout+0x115/0x280
> __down+0x90/0xe0
> ? down+0x43/0x60
> down+0x43/0x60
> xfs_buf_lock+0x29/0xa0 [xfs]
> xfs_buf_find.isra.34+0x1fd/0x610 [xfs]
> xfs_buf_get_map+0x4c/0x400 [xfs]
> xfs_buf_read_map+0x52/0x2e0 [xfs]
> ? xfs_read_agf+0x84/0x100 [xfs]
> xfs_trans_read_buf_map+0x223/0x2d0 [xfs]
> ? xfs_read_agf+0x84/0x100 [xfs]
> xfs_read_agf+0x84/0x100 [xfs]
> xfs_alloc_read_agf+0x3a/0x1b0 [xfs]
> xfs_alloc_fix_freelist+0x434/0x500 [xfs]
> ? __kmalloc+0x3c0/0x400
> ? xfs_dquot_to_disk+0xfe/0x110 [xfs]
> ? xfs_qm_dquot_logitem_format+0x89/0x150 [xfs]
> xfs_free_extent_fix_freelist+0x61/0xa0 [xfs]
> __xfs_free_extent+0x6a/0x1c0 [xfs]
> xfs_trans_free_extent+0x3b/0xd0 [xfs]
> xfs_extent_free_finish_item+0x23/0x40 [xfs]
> xfs_defer_finish_noroll+0x24b/0x5a0 [xfs]
> xfs_defer_finish+0x13/0x80 [xfs]
> xfs_itruncate_extents_flags+0x13b/0x220 [xfs]
> xfs_setattr_size+0x35c/0x3d0 [xfs]
> ? aa_compute_fperms+0x16f/0x190
> xfs_vn_setattr+0xf7/0x110 [xfs]

And that confirms the suspicions I have - another process blocked on
an AGF lock. That might be IO it's blocking on here, but combined
with the inode unlink being stuck in inodegc, I have a suspicion
that this is an AGF - AGI - inode buffer lock ordering issue.

There's been a heap of work done in this space since 5.15, and I'd
suggest that almost none of it is trivially backportable as it's
deeply intertwined with relatively complex algorithm changes.

I also recently found evidence of another AGI -> inode buffer -> AGF
-> inode buffer deadlock issue debugging another hang from a machine
making heavy use of O_TMPFILEs via OVL. Without a kernel code to
examine in intricate detail, I can't say if this is what you are
seeing.

I also don't have a fix for that yet because, well, it's complex and
I can really only accomodate one complex high priority bug at a time
in my head and it hasn't been my highiest priority. As it is, I
don't think the fix for it will be trivially backportable to 5.15,
either....

> INFO: task journalctl:3549038 blocked for more than 1912 seconds.
> Tainted: G OE 5.15.35netflix-g54efd87a8576 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:journalctl state:D stack: 0 pid:3549038 ppid:469330 flags:0x00000224
> Call Trace:
> <TASK>
> __schedule+0x2c5/0x8d0
> schedule+0x3a/0xa0
> schedule_timeout+0x115/0x280
> ? __traceiter_sched_wakeup+0x29/0x40
> ? ttwu_do_wakeup+0x106/0x170
> wait_for_completion+0x9f/0x100
> __flush_work+0x161/0x1f0
> ? worker_detach_from_pool+0xb0/0xb0
> xfs_inodegc_flush.part.25+0x4f/0xa0 [xfs]
> xfs_fs_statfs+0x35/0x1e0 [xfs]
> statfs_by_dentry+0x67/0x90
> vfs_statfs+0x16/0xd0
> ovl_statfs+0x40/0x70 [overlay]

Fixed by commit 5e672cd69f0a ("xfs: introduce xfs_inodegc_push()")
which also means that the commits Darrick suggested you try aren't
relevant. The mod_delayed_work() issues they address weren't
introduced until commit 7cf2b0f9611b ("xfs: bound maximum wait time
for inodegc work") which has a direct dependency on 5e672cd69f0a
being present....

-Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx