Re: [PATCH] fix de_thread() vs do_coredump() deadlock

From: Oleg Nesterov
Date: Tue Apr 11 2006 - 05:16:56 EST


On 04/11, Roland McGrath wrote:
>
> > A fatal signal is placed to ->shared_pending in any (non tkill) case, so I
> > think it is not lost (but may be unnoticed for a while).
> >
> > sig_kernel_coredump() is different. It could be stealed by one of sub-threads
> > while another one does de_thread(), that is why it could be lost.
>
> I am not talking about the case where it's still pending on either queue.
> Those are fine as they are. What matters is when it's been dequeued, in
> the race window afer releasing the siglock in get_signal_to_deliver.
> There are two windows of race here.
>
> The first one is only when ptrace'd, and doesn't even require a race that
> takes good timing to create. The window is in ptrace_stop when the siglock
> is released, including all the time stopped in TASK_TRACED. Say another
> thread does an exec (and de_thread) while we're in TASK_TRACED after
> reporting a death signal to the debugger. The SIGKILL wakes us out of
> ptrace_stop. Assuming the debugger wasn't racing with a PTRACE_CONT too,
> then the signal remains in ->exit_code and (assuming the SIGNAL_GROUP_EXIT
> check there reverted, as I mentioned before), we just come out of the
> ptrace path with the siglock held as if we'd dequeued the signal without ptrace.

Yes, you are right, thanks.

> Then comes the second window. With no ptrace, or after ptrace, we've
> dequeued the signal and if it's a SIG_DFL fatal signal, we release the
> siglock. Here a non-coredump signal just calls do_group_exit. Meanwhile,
> a racing exec comes along and sets SIGNAL_GROUP_EXIT (or it already did
> earlier while we were in ptrace_stop). In do_group_exit, we will see that
> SIGNAL_GROUP_EXIT is set, and just do_exit ourselves with the group_exit_code.
> When it's an exec rather than a real exit, we've swallowed the signal.
> This is no different than the coredump case. (When do_coredump bails out,
> then it joins this very same code path.)

I still think we are ok with no ptrace. If that (non-coredump) signal was
delivered before de_thread sets SIGNAL_GROUP_EXIT, then this flag is set
by __group_complete_signal(), so de_thread return -EAGAIN. If de_thread()
wins, the signal will be dequeued later from ->shared_pending.

tkill() is different, but I don't see any problem here.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/