Re: [6.5-rc5 regression] core dump hangs (was Re: [Bug report] fstests generic/051 (on xfs) hang on latest linux v6.5-rc5+)

From: Zorro Lang
Date: Mon Jun 12 2023 - 03:26:00 EST


On Sun, Jun 11, 2023 at 08:14:25PM -0700, Linus Torvalds wrote:
> On Sun, Jun 11, 2023 at 7:22 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > I guess the regression fix needs a regression fix....
>
> Yup.
>
> From the description of the problem, it sounds like this happens on
> real hardware, no vhost anywhere?
>
> Or maybe Darrick (who doesn't see the issue) is running on raw
> hardware, and you and Zorro are running in a virtual environment?

I tested virtual environment and raw hardware both.

We have a testing machines pool which contains lots of different machines,
include real machines, kvm, and other kind of vm, include different
rches (aarch64, s390x, ppc64le and x86_64), and different kind of
storage (virt, hard RAID, generic scsi disk, pmem, etc...). They all hang
on fstests generic/051.

I remembered Darrick said he did test with ~160 VMs (need confirmation
from him).

So this issue might not be related with VMs or real machine. Hmm... maybe
related with some kernel config? If Darrick would like to provide his .config
file, I can make a diff with mine to check the difference.

Thanks,
Zorro

>
> It sounds like zap_other_threads() and coredump_task_exit() do not
> agree about the core_state->nr_threads counting, which is part of what
> changed there.
>
> [ Goes off to look ]
>
> Hmm. Both seem to be using the same test for
>
> (t->flags & (PF_IO_WORKER | PF_USER_WORKER)) != PF_USER_WORKER
>
> which I don't love - I don't think io_uring threads should participate
> in core dumping either, so I think the test could just be
>
> (t->flags & PF_IO_WORKER)
>
> but that shouldn't be the issue here.
>
> But according to
>
> https://lore.kernel.org/all/20230611124836.whfktwaumnefm5z5@zlang-mailbox/
>
> it's clearly hanging in wait_for_completion_state() in
> coredump_wait(), so it really looks like some confusion about that
> core_waiters (aka core_state->nr_threads) count.
>
> Oh. Humm. Mike changed that initial rough patch of mine, and I had
> moved the "if you don't participate in c ore dumps" test up also past
> the "do_coredump()" logic.
>
> And I think it's horribly *wrong* for a thread that doesn't get
> counted for core-dumping to go into do_coredump(), because then it
> will set the "core_state" to possibly be the core-state of the vhost
> thread that isn't even counted.
>
> So *maybe* this attached patch might fix it? I haven't thought very
> deeply about this, but vhost workers most definitely shouldn't call
> do_coredump(), since they are then not counted.
>
> (And again, I think we should just check that PF_IO_WORKER bit, not
> use this more complex test, but that's a separate and bigger change).
>
> Linus