fork/wait race in 2.4.0-pre?

From: Adam J. Richter (adam@yggdrasil.com)
Date: Sat Dec 23 2000 - 10:32:17 EST


        I reported this problem a few months ago in bug-glibc and
did not get any response, although that is not unexpected since it is
unclear where the problem is. So that bug report and this report
will probably serve just to chronicle the problem in case anybody
sees something similar.

        Anyhow, the problem is that somehow fork or vfork (makes no
difference) will return an apparently valid pid and then the child
process will disappear. Calling wait or waitpid will return errno 10
(ECHILD, "no child process"), and will continue to return errno 10
if wait or waitpid is called again. I got lucky with some strategically
placed printf's at a point where this problem sometimes appears and
was able to determine that, at least when wait() is called, the
signal handler for SIGCLD (17) is SIG_IGN (1), so it seems less
likely that some userland facility is reaping the process, especially
since one of the places where this problem occurs is a very simple
program that does little more than fork and wait.

        This usually happens during the "configure" phase of our
build process, which is right after about 2.5GB of sources
have been extracted from CVS to a directory tree, so there may
be some IO congestion that could lead to unusual timing relationships,
leading to unsual results from race conditions. Also, the problem
started occurring occasionally when the machine in question got
an 866MHz CPU, and started occuring more often when it got a 1GHz
CPU. So, more instructions per time slice seems to be a relevant
factor.

        Anyhow, I know this is a very slippery bug and it may
be months before it is tracked down either here or elsewhere, but
I thought it would be helpful to at least document it for the
linux-kernel archives.

Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 104
adam@yggdrasil.com \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Dec 23 2000 - 21:00:34 EST