Re: + prctl-add-pr_setget_child_reaper-to-allow-simple-process-supervision.patch added to -mm tree

From: Kay Sievers
Date: Wed Aug 17 2011 - 09:14:01 EST


On Wed, Aug 17, 2011 at 13:55, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> On 08/16, Andrew Morton wrote:
>> From: Lennart Poettering <lennart@xxxxxxxxxxxxxx>
>>
>> Userspace service managers/supervisors need to track their started
>> services. ÂMany services daemonize by double-forking and get implicitely
>> re-parented to PID 1. ÂThe process manager will no longer be able to
>> receive the SIGCHLD signals for them.
>>
>> With this prctl, a service manager can mark itself as a sort of 'sub-init'
>> process, able to stay as the parent process for all processes created by
>> the started services. ÂAll SIGCHLD signals will be delivered to the
>> service manager.
>
> I try to never argue with the new features. But to be honest, this
> doesn't look very good to me.
>
> OK, a service manager M does prctl(PR_SET_CHILD_REAPER), then it forks
> a service X which forks another child C and exits. Then C exits and
> notifies M.
>
> But. How can M know that the service X should be restarted? It only
> knows the pid.

Legacy services write pid files and we read them, so we know the pid
to watch for. Proper services never double-fork and reparent in a
modern init environment.

> What if wait(WEXITED) succeeds because C in turn does
> fork + exit?

Nothing is really doing this. There are a few hacks out there that do
that on service update instead of just restarting, and such services
explicitly need to tell their new pid, up or they just need to disable
supervision if they can't behave. :)

> What M has 2 or more services?

Nothing different from one service. They are all tracked the same way.

> Anyway, the implementation is certainly buggy.
>
>> @@ -1296,6 +1296,8 @@ struct task_struct {
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â* execve */
>> Â Â Â unsigned in_iowait:1;
>>
>> + Â Â /* Reparent child processes to this process instead of pid 1. */
>> + Â Â unsigned child_reaper:1;
>
> First of all - this is already very wrong imho. This should be
> per-process, not per-thread.

What do you mean? That would go where instead?

>> + Â Â /* find the first ancestor which is marked as child_reaper */
>> + Â Â for (reaper = father->parent;
>> + Â Â Â Â Âreaper != &init_task && reaper != pid_ns->child_reaper;
>> + Â Â Â Â Âreaper = reaper->parent)
>
> This loop can never reach init_task/child_reaper and crash the kernel.

You mean: *if* this loop can never ...?

> For example, father->parent can point to init_task's sub-thread.
>
> OTOH you shouldn't use init_task at all.

What would we use instead?

> Also. You shouldn't do this if the sub-namespace init exits, this is
> wrong.

It we find a sub-init, before the namespace PID1, why wouldn't we return it?

>> + Â Â Â Â Â Â if (reaper->child_reaper)
>> + Â Â Â Â Â Â Â Â Â Â return reaper;
>
> No, we can't blindly return this task, it can be dead/exiting. More
> precisely, we must not do this if it has already passed
> forget_original_parent(). That is why the code above checks PF_EXITING.

Ok.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/