Re: More waitpid issues with CLONE_DETACHED/CLONE_THREAD

From: Linus Torvalds
Date: Sat Jan 31 2004 - 23:43:41 EST




On Sat, 31 Jan 2004, Daniel Jacobowitz wrote:
>
> This may be related to the python bug reported today...

Indeed.

Having a "waitpid(x, .., WNOHANG)" return 0 is a very interesting
condition. That condition basically guarantees that:

- the kernel did find the child
- but the kernel decided that the child cannot be reaped right then.

If you see the process as a Zombie in a "ps" listing, then we know that
that isn't the reason why it couldn't be reaped. Can you verify that
/proc/<pid>/status shows it as "Z (zombie)"?

In fact, if we see it as "Z (zombie)", we know even more: it means that
wait_task_zombie() was never called, because that would have started out
with changing the process state to "X (dead)".

And that in turn implies that "eligible_child()" would have returned 2.

Which is a normal occurrence: it happens when a process group leader still
has threads attached to it. At that point it may be a Zombie, but we can't
reap it yet. The threads have to go away before the thing can be reaped.

Can you verify that that process doesn't have any sub-threads? (Again,
that should be easily visible in /proc/<pid>/task/).

Another alternative is that the process is a zombie, but it is being
traced. When that happens, it shows up on the "ptrace_children" list, and
we'll see in in wait4(), but we won't be able to reap it.

Roland, Ingo - have you followed the discussion on linux-kernel? Something
strange does seem to be going on..

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/