[PATCH] xfs: don't do inodgc work if task is exiting

From: Tycho Andersen
Date: Thu May 11 2023 - 11:17:48 EST


From: Tycho Andersen <tandersen@xxxxxxxxxxx>

Similar to 5a8bee63b10f ("fuse: in fuse_flush only wait if someone wants
the return code"), we have a task that is stuck that can't be killed, so a
pid ns can't exit, wreaking all kinds of havoc:

INFO: task C2 CompilerThre:3546103 blocked for more than 1912 seconds.
Tainted: G OE 5.15.35netflix-g54efd87a8576 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:C2 CompilerThre state:D stack: 0 pid:3546103 ppid:3546047 flags:0x00004222
Call Trace:
<TASK>
__schedule+0x2c5/0x8d0
schedule+0x3a/0xa0
schedule_timeout+0x115/0x280
? __schedule+0x2cd/0x8d0
wait_for_completion+0x9f/0x100
__flush_work+0x161/0x1f0
? worker_detach_from_pool+0xb0/0xb0
destroy_inode+0x3b/0x70
__dentry_kill+0xcc/0x160
dput+0x141/0x2e0
ovl_destroy_inode+0x15/0x50 [overlay]
destroy_inode+0x3b/0x70
__dentry_kill+0xcc/0x160
dput+0x141/0x2e0
__fput+0xe1/0x250
task_work_run+0x73/0xb0
do_exit+0x37e/0xb80
do_group_exit+0x3a/0xa0
get_signal+0x140/0x870
? perf_event_groups_first+0x80/0x80
arch_do_signal_or_restart+0xae/0x7c0
? __x64_sys_futex+0x5e/0x1d0
? __x64_sys_futex+0x5e/0x1d0
exit_to_user_mode_prepare+0x10f/0x1c0
syscall_exit_to_user_mode+0x26/0x40
do_syscall_64+0x46/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f3295cf3cf5
RSP: 002b:00007f327c834d00 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00007f32900bde50 RCX: 00007f3295cf3cf5
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f32900bde78
RBP: 00007f327c834dd0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f32900bde74
R13: 00007f32900bde78 R14: 00007f32900bde28 R15: 0000000000000000
</TASK>

The bad call path is:

xfs_fs_destroy_inode() ->
xfs_inode_mark_reclaimable ->
xfs_inodegc_queue() ->
xfs_inodegc_want_queue_work()
xfs_inodegc_want_flush_work() ->
flush_work() ->
__flush_work() ->
wait_for_completion()

We can avoid this task getting stuck by just not queuing the gc work from
do_exit().

The fact that there's a lockup at all probably indicative of another xfs
bug somewhere else that I am still looking for, but we should at least not
generate unkillable tasks as a result.

Signed-off-by: Tycho Andersen <tandersen@xxxxxxxxxxx>
CC: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
---
fs/xfs/xfs_icache.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 351849fc18ff..90e94d84f8ad 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -2011,6 +2011,9 @@ xfs_inodegc_want_queue_work(
*
* Note: If the current thread is running a transaction, we don't ever want to
* wait for other transactions because that could introduce a deadlock.
+ * If the task is currently exiting there is nobody to wait for
+ * the flush and it can deadlock, so we should not try to flush in this case
+ * either.
*/
static inline bool
xfs_inodegc_want_flush_work(
@@ -2021,6 +2024,9 @@ xfs_inodegc_want_flush_work(
if (current->journal_info)
return false;

+ if (current->flags & PF_EXITING)
+ return false;
+
if (shrinker_hits > 0)
return true;


base-commit: 78b421b6a7c6dbb6a213877c742af52330f5026d
--
2.34.1