Re: [RESEND][RFC PATCH v2] waitfd

From: Denys Vlasenko
Date: Tue Mar 01 2011 - 20:38:03 EST


On Sat, Jan 10, 2009 at 4:57 PM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> On 01/10, Scott James Remnant wrote:
>> On Wed, 2009-01-07 at 12:53 -0800, Roland McGrath wrote:
>> > Do we really need another one for this?  How about using signalfd plus
>> > setting the child's exit_signal to a queuing (SIGRTMIN+n) signal instead of
>> > SIGCHLD?  It's slightly more magical for the userland process to know to do
>> > that (fork -> clone SIGRTMIN).  But compared to adding a syscall we don't
>> > really have to add, maybe better.
>> >
>> This wouldn't help the init daemon case:
>>
>> - the exit_signal is set on the child, not on the parent.
>>
>>   While the init daemon could clone() every new process and set
>>   exit_signal, this would not be set for processes reparented to init.
>>
>>   Even if we had a new syscall to change the exit_signal of a given
>>   process, *and* had the init reparent notification patch, this still
>>   wouldn't be sufficient; you'd have a race condition between the time
>>   you were notified of the reparent, and the time you set exit_signal,
>>   in which the child could die.
>>
>>   Since exit_signal is always reset to SIGCHLD before reparenting, this
>>   could be done by resetting it to a different signal; but at this point
>>   we're getting into a rather twisty method full of traps.
>>
>> - exit_signal is reset to SIGCHLD on exec().
>>
>>   Pretty much a plan-killer ;)
>
> I can't understand why should we change ->exit_signal if we want to
> use signalfd. Yes, SIGCHLD is not rt. So what?
>
> We do not need multiple signals in queue if we want to reap multiple
> zombies. Once we have a single SIGCHLD (reported by signalfd or
> whatever) we can do do_wait(WNOHANG) in a loop.
>
> Confused.

I know I am terribly late for the party :)

"do_wait(WNOHANG) in a loop" is a performance problem.

Oleg, do you remember that strace bug when it was swamped
with gazillions of stop notifications from a multithreaded
task, then by dealing with them one-by-one it was causing
unfairness and ultimately "this program never finishes
when run under strace" bug?

And another typical nuisance that running multithreaded
stuff under strace is much slower, even with -e option
which limits the set of decoded syscalls?

Having waitfd would help both cases: strace can gulp
a lot of waitpid notifications in one go, and
batch process them.

--
vda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/