Re: [PATCH v2 1/3] pidfd: allow pidfd_open() on non-thread-group leaders

From: Tycho Andersen
Date: Wed Dec 13 2023 - 14:18:41 EST


On Wed, Dec 13, 2023 at 01:18:03PM +0100, Christian Brauner wrote:
> On Mon, Dec 11, 2023 at 03:28:09PM +0800, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed "kernel-selftests.pidfd.pidfd_test.fail" on:
> >
> > commit: e6d9be676d2c1fa8332c63c4382b8d3227fca991 ("[PATCH v2 1/3] pidfd: allow pidfd_open() on non-thread-group leaders")
> > url: https://github.com/intel-lab-lkp/linux/commits/Tycho-Andersen/selftests-pidfd-add-non-thread-group-leader-tests/20231208-011135
> > patch link: https://lore.kernel.org/all/20231207170946.130823-1-tycho@tycho.pizza/
> > patch subject: [PATCH v2 1/3] pidfd: allow pidfd_open() on non-thread-group leaders
> >
> > in testcase: kernel-selftests
> > version: kernel-selftests-x86_64-60acb023-1_20230329
> > with following parameters:
> >
> > group: pidfd
> >
> >
> >
> > compiler: gcc-12
> > test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> > | Closes: https://lore.kernel.org/oe-lkp/202312111516.26dc3fd5-oliver.sang@xxxxxxxxx
> >
> >
> > besides, we also observed kernel-selftests.pidfd.pidfd_poll_test.fail on this
> > commit, but clean on parent:
> >
> > bee0e7762ad2c602 e6d9be676d2c1fa8332c63c4382
> > ---------------- ---------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > :6 100% 6:6 kernel-selftests.pidfd.pidfd_poll_test.fail
> > :6 100% 6:6 kernel-selftests.pidfd.pidfd_test.fail
> >
> >
> >
> > TAP version 13
> > 1..7
> > # timeout set to 300
> > # selftests: pidfd: pidfd_test
> > # TAP version 13
> > # 1..8
> > # # Parent: pid: 2191
> > # # Parent: Waiting for Child (2192) to complete.
> > # # Child (pidfd): starting. pid 2192 tid 2192
> > # # Child Thread: starting. pid 2192 tid 2193 ; and sleeping
> > # # Child Thread: doing exec of sleep
> > # Bail out! pidfd_poll check for premature notification on child thread exec test: Unexpected epoll_wait result (c=0, events=0) (errno 0)
>
> So it seems that this broke multi-threaded exit notifications.

Yeah... I've been trying to figure out how to fix it.

de_thread() calls release_task() for the original leader, which I
didn't realize.

Tycho