Re: [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()

From: Frederic Weisbecker
Date: Fri Dec 02 2022 - 17:54:39 EST


On Wed, Nov 30, 2022 at 12:37:15PM -0600, Eric W. Biederman wrote:
> Frederic Weisbecker <frederic@xxxxxxxxxx> writes:
> Two questions.
>
> 1) Is there any chance you need the exit_task_rcu_stop() and
> exit_tasks_rcu_start() around schedule in the part of this code that
> calls kernel_wait4.

Indeed it could be relaxed there too if necessary.

>
> 2) I keep thinking zap_pid_ns_processes() should be changed so that
> after it sends SIGKILL to all of the relevant processes to not wait,
> and instead have wait_consider_task simply not allow the
> init process of the pid namespace to be reaped.
>
> Am I right in thinking that such a change were to be made it would
> make remove the deadlock without having to have any special code?
>
> It is just tricky enough to do that I don't want to discourage your
> simpler change but this looks like a case that makes the pain of
> changing zap_pid_ns_processes worthwhile in the practice.

So you mean it still reaps those that were EXIT_ZOMBIE before ignoring
SIGCHLD (the kernel_wait4() call) but it doesn't sleep anymore on those
that autoreap (or get reaped by a parent outside that namespace) after
ignoring SIGCHLD? Namely it doesn't do the schedule() loop I'm working
around here and proceeds with exit_notify() and notifies its parent?

And then in this case the responsibility of sleeping, until the init_process
of the namespace is the last task in the namespace, goes to the parent while
waiting that init_process, right?

But what if the init_process of the given namespace autoreaps? Should it then
wait itself until the namespace is empty? And then aren't we back to the initial
issue?

Thanks.