[PATCH] fix for zeroed user-space tids in multi-threaded core dumps

From: Ernie Petrides
Date: Wed Oct 25 2006 - 21:43:06 EST


Hi, Andrew. Please consider the patch below for the next open 2.6 release.

The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
on behalf of pthread_create() library calls. This feature is used to
request that the kernel clear the thread-id in user space (at an address
provided in the syscall) when the thread disassociates itself from the
address space, which is done in mm_release().

Unfortunately, when a multi-threaded process incurs a core dump (such as
from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
the other threads, which then proceed to clear their user-space tids
before synchronizing in exit_mm() with the start of core dumping. This
misrepresents the state of process's address space at the time of the
SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
problems (misleading him/her to conclude that the threads had gone away
before the fault).

The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
core dump has been initiated.

Cheers. -ernie



--- linux-2.6.18/kernel/fork.c.orig
+++ linux-2.6.18/kernel/fork.c
@@ -439,7 +439,9 @@ void mm_release(struct task_struct *tsk,
tsk->vfork_done = NULL;
complete(vfork_done);
}
- if (tsk->clear_child_tid && atomic_read(&mm->mm_users) > 1) {
+ if (tsk->clear_child_tid &&
+ mm->core_waiters == 0 &&
+ atomic_read(&mm->mm_users) > 1) {
u32 __user * tidptr = tsk->clear_child_tid;
tsk->clear_child_tid = NULL;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/