Re: [6.5-rc5 regression] core dump hangs (was Re: [Bug report] fstests generic/051 (on xfs) hang on latest linux v6.5-rc5+)

From: Eric W. Biederman
Date: Mon Jun 12 2023 - 11:53:12 EST



Can someone who can reproduce the hang run this test patch.

I am currently drawing a blank looking at the changes, so I am
proposing some debug code to help us narrow things down.

Can someone who can reproduce this run the code below?

The tests reproducing this don't appear to use use /dev/host-net or
/dev/vhost-vsock. So if the WARN_ON's trigger it is a good sign
that code connected to the WARN_ON's are wrong.

If the WARN_ON's don't trigger I suspect the code in kernel/fork.c

But as I said staring at the code I don't see anything wrong.

Eric


diff --git a/fs/coredump.c b/fs/coredump.c
index 88740c51b942..e9acf0a2d2f0 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -374,6 +374,7 @@ static int zap_process(struct task_struct *start, int exit_code)
/* The vhost_worker does not particpate in coredumps */
if ((t->flags & (PF_USER_WORKER | PF_IO_WORKER)) != PF_USER_WORKER)
nr++;
+ else WARN_ON_ONCE(true);
}
}

diff --git a/kernel/exit.c b/kernel/exit.c
index edb50b4c9972..56002a58ec33 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -437,6 +437,7 @@ static void coredump_task_exit(struct task_struct *tsk)
}
__set_current_state(TASK_RUNNING);
}
+ else if (core_state) WARN_ON_ONCE(true);
}

#ifdef CONFIG_MEMCG
diff --git a/kernel/signal.c b/kernel/signal.c
index 2547fa73bde5..1be27dbbce62 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1371,6 +1371,7 @@ int zap_other_threads(struct task_struct *p)
/* Don't require de_thread to wait for the vhost_worker */
if ((t->flags & (PF_IO_WORKER | PF_USER_WORKER)) != PF_USER_WORKER)
count++;
+ else WARN_ON_ONCE(true);

/* Don't bother with already dead threads */
if (t->exit_state)