kernel reports kill too late?

From: Klaus Dittrich
Date: Thu Nov 18 2004 - 07:36:48 EST


With kernels newer 2.6.10.rc1-bk18 glibc nptl checks failed.

Now this is fixed (See lkml "Futex queue_me/get_user ordering")
another problem showed off.

I get spurious "Failed to kill test process: No child processes"
errors during nptl checks now.

Appended is a simple test-prog derived from glibc's check which
runs fine on 2.6.10.rc1-bk18 but failed most of the time with
newer kernels.

(Not beeing a guru at all) I interpret the results as a delay
or loss of the status of a killed process which happens only
if this process runs a thread.
(Not calling sleep_mostly() as a thread works as expected.)

Further noteworthy: it happens not always.



Output on 2.6.10.rc1-bk18
thread starts spinning
thread alive
try to kill pid 1436
killed after 4007 waitpid calls
killed by signal: status = 0

Output on 2.6.10.rc2-bk2
test 0 --------------------------------
thread starts spinning
thread alive
try to kill pid 32321
killed after 3389 waitpid calls
killed by signal: status = 0
test 1 --------------------------------
thread starts spinning
thread alive
try to kill pid 32323
killed after 17 waitpid calls
killed by signal: status = 138
Kill failed! waitpid returned -1

Both systems are smp 2xP4 and 2xP3.

Can anyone else see this?

--
Regards Klaus


#include <sys/wait.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <pthread.h>


static void print_exit_status (int status) {

if(WIFEXITED(status)) {
printf("killed normal : status = %d\n", WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
printf("killed by signal: status = %d\n", WEXITSTATUS(status));
}
}

static void * sleep_mostly (void * arg)
{
printf("thread starts spinning\n");

while (1) {
sleep (1);
printf ("thread alive\n");
}

/* NOTREACHED */
return NULL;
}

static int do_fork (void)
{
pid_t pid = fork ();

if (pid == 0) { /* child */

pthread_t th;
int cr = pthread_create (&th, NULL, sleep_mostly, (void*)NULL);

if (cr != 0) {
fprintf (stderr, "Thread creation failed %s\n", strerror(cr));
exit (1);
}

sleep(3);
pthread_exit (NULL);
} else if (pid < 0) {
fprintf (stderr, "Cannot fork!\n");
exit (1);
}
return pid;
}

static int test_wait_pid(void) {

int pid = do_fork();

sleep(2);
fprintf (stderr, "try to kill pid %d\n", pid);
kill (pid, SIGKILL);

int killed;
int status;
int n;

for (n = 1; n < 9999; n++) {
killed = waitpid (pid, &status, WNOHANG|WUNTRACED);
if (killed != 0)
break;
}

fprintf (stderr, "killed after %d waitpid calls\n", n);
print_exit_status(status);

if (killed != 0 && killed != pid) {
fprintf (stderr, "Kill failed! waitpid returned %d\n", killed);
exit (1);
}

return 0;
}

int main (int argc, char* argv[]) {

int i = 0;
do {
printf ("test %d --------------------------------\n", i);
test_wait_pid();
i++;
} while ( i < 10);
return 0;
}