Re: init's children list is long and slows reaping children.

From: Eric W. Biederman
Date: Sat Apr 07 2007 - 20:40:49 EST


Oleg Nesterov <oleg@xxxxxxxxxx> writes:

> On 04/06, Oleg Nesterov wrote:
>>
>> Perhaps,
>>
>> --- t/kernel/exit.c~ 2007-04-06 23:31:31.000000000 +0400
>> +++ t/kernel/exit.c 2007-04-06 23:31:57.000000000 +0400
>> @@ -275,10 +275,7 @@ static void reparent_to_init(void)
>> remove_parent(current);
>> current->parent = child_reaper(current);
>> current->real_parent = child_reaper(current);
>> - add_parent(current);
>> -
>> - /* Set the exit signal to SIGCHLD so we signal init on exit */
>> - current->exit_signal = SIGCHLD;
>> + current->exit_signal = -1;
>>
>> if (!has_rt_policy(current) && (task_nice(current) < 0))
>> set_user_nice(current, 0);
>>
>> is enough. init is still our parent (make ps happy), but it can't see us,
>> we are not on ->children list.
>
> OK, this doesn't work. A multi-threaded init may do execve().

Good catch. daemonize must die!

> So, we can re-parent a kernel thread to swapper. In that case it doesn't matter
> if we put task on ->children list or not.

Yes. We can.

> User-visible change. Acceptable?

We would have user visible changes when we changed to ktrhead_create anyway,
and it's none of user space's business so certainly.

>> Off course, we also need to add preparent_to_init() to kthread() and
>> (say) stopmachine(). Or we can create kernel_thread_detached() and
>> modify callers to use it.
>
> It would be very nice to introduce CLONE_KERNEL_THREAD instead, then

If we are going to do something to copy_process and the like let's take
this one step farther. Let's pass in the value of the task to copy.

Then we can add a wrapper around copy_process to build kernel_thread
something like:

struct task_struct *__kernel_thread(int (*fn)(void *), void * arg,
unsigned long flags)
{
struct task_struct *task;
struct pt_regs regs, *reg;

reg = kernel_thread_regs(&regs, fn, arg);
task = copy_process(&init_task, flags, 0, reg, 0, NULL, NULL, NULL, 0);
if (!IS_ERR(task))
wake_up_new_task(task, flags);

return task;
}

long kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
{
struct task_struct *task;
task = __kernel_thread(fn, arg, flags);
if (IS_ERR(task))
return PTR_ERR(task);
return task->pid;
}

After that daemonize just becomes:

void daemonize(const char *name, ...)
{
va_list args;

va_start(args, name);
vsnprintf(current->comm, sizeof(current->comm), name, args);
va_end(args);
}

And kthread_create becomes:

struct task_struct *kthread_create(int (*threadfn)(void *data),
void *data,
const char namefmt[],
...)
{
struct kthread_create_info create;
struct task_struct *task;

create.threadfn = threadfn;
create.data = data;

/* We want our own signal handler (we take no signals by default). */
task = __kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
if (!IS_ERR(task)) {
va_list args;
va_start(args, namefmt);
vsnprintf(task->comm, sizeof(task->comm), namefmt, args);
va_end(args);
}
return task;
}

If we are willing to go that far. I think it is worth it to touch the
architecture specific code. As that removes an unnecessary wait
queue, and ensures that kernel threads always start with a consistent
state. So it is a performance, scalability and simplicity boost.

Otherwise we should just put the code in the callers of kernel_thread
like we do today.

I don't have the energy to rework all of th architecture or even a
noticeable fraction of them right now. I have to much on my plate.
A non-architecture specific solution looks like a fairly simple
patch and I might get around to it one of these times..

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/