Re: [6.5-rc5 regression] core dump hangs (was Re: [Bug report] fstests generic/051 (on xfs) hang on latest linux v6.5-rc5+)

From: Linus Torvalds
Date: Mon Jun 12 2023 - 13:51:48 EST


On Mon, Jun 12, 2023 at 10:29 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> Looks fine to me to just kill it indeed, whatever we did need this
> for is definitely no longer the case. I _think_ we used to have
> something in the worker exit that would potentially sleep which
> is why we killed it before doing that, now it just looks like dead
> code.

Ok, can you (and the fsstress people) confirm that this
whitespace-damaged patch fixes the coredump issue:


--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -221,9 +221,6 @@ static void io_worker_exit(..
raw_spin_unlock(&wq->lock);
io_wq_dec_running(worker);
worker->flags = 0;
- preempt_disable();
- current->flags &= ~PF_IO_WORKER;
- preempt_enable();

kfree_rcu(worker, rcu);
io_worker_ref_put(wq);

Jens, I think that the two lines above there, ie the whole

io_wq_dec_running(worker);
worker->flags = 0;

thing may actually be the (partial?) reason for those PF_IO_WORKER
games. It's basically doing "now I'm doing stats by hand", and I
wonder if now it decrements the running worker one time too much?

Ie when the finally *dead* worker schedules away, never to return,
that's when that io_wq_worker_sleeping() case triggers and decrements
things one more time.

So there might be some bookkeeping reason for those games, but it
looks like if that's the case, then that

worker->flags = 0;

will have already taken care of it.

I wonder if those two lines could just be removed too. But I think
that's separate from the "let's fix the core dumping" issue.

Linus