Re: [PATCH] fix handling of SIGCHILD from reaped child

From: Roland McGrath
Date: Tue Feb 20 2007 - 18:11:09 EST


I'm usually the stickler for anal POSIX compliance, but this is one thing
that I did notice a while ago, realized Linux had never done it, and
decided I didn't care.

This is one of those parts of the standard that was originally written in a
single-threaded process frame of mind, and was never amended or clarified
later when multi-threaded semantics got well-specified in the standard.

It's clear what the requirement is trying to achieve. It lets you have a
SIGCHLD signal handler that calls wait, and be sure its call never blocks,
as long as you block SIGCHLD while making any other wait calls. But Linux
has never done this even for single-threaded processes, so existing
application code already has to cope with the race. (Anyway, this
guarantee is not all that helpful if you have more than one child and so
might be running the handler once after SIGCHLD was generated more than
once. You can't just use WNOHANG in your handler because you aren't
actually guaranteed that the zombie is ready already when you get the SIGCHLD.)

This guarantee is not of any use when there might be other threads with
SIGCHLD unblocked or other threads that call wait* functions (calls that
draw from the same pool of PIDs anyway). There can always be another
thread that just dequeued the SIGCHLD but hasn't gotten into its handler
yet, so clearing the pending SIGCHLD doesn't really cover it.

Unhelpful as it is the multithreaded context, I think it's clear that the
standard's wording means "when SIGCHLD is blocked by the thread calling
wait", but in fact as to being a guarantee it's only meaningful when
SIGCHLD is blocked by all threads. The mention of blocking the signal is
only there to remind you that well-defined semantics about a "pending"
signal only ever apply when the signal is blocked. If any thread has it
unblocked, then "pending" is an ephemeral condition not necessarily
observable at all--as soon as you could say it's pending, some such thread
might be handling it.

The "if there is another child available" test is rather ugly to do
correctly now. It would be less so if the children list moved into
signal_struct and was just shared directly. The most "correct" it can get
is still not all that useful in a multithreaded context. So I'm pretty
ambivalent about bothering with this.


Thanks,
Roland

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/