Re: [PATCH] sched: __fatal_signal_pending() should also check PF_EXITING

From: Tycho Andersen
Date: Wed Jul 27 2022 - 15:41:43 EST


On Wed, Jul 27, 2022 at 09:19:50PM +0200, Oleg Nesterov wrote:
> On 07/27, Tycho Andersen wrote:
> >
> > On Wed, Jul 27, 2022 at 07:55:39PM +0200, Oleg Nesterov wrote:
> > > On 07/27, Tycho Andersen wrote:
> > > >
> > > > Hi all,
> > > >
> > > > On Wed, Jul 20, 2022 at 08:54:59PM -0500, Serge E. Hallyn wrote:
> > > > > Oh - I didn't either - checking the sigkill in shared signals *seems*
> > > > > legit if they can be put there - but since you posted the new patch I
> > > > > assumed his reasoning was clear to you. I know Eric's busy, cc:ing Oleg
> > > > > for his interpretation too.
> > > >
> > > > Any thoughts on this?
> > >
> > > Cough... I don't know what can I say except I personally dislike this
> > > patch no matter what ;)
> > >
> > > And I do not understand how can this patch help. OK, a single-threaded
> > > PF_EXITING task sleeps in TASK_KILLABLE. send_signal_locked() won't
> > > wake it up anyway?
> > >
> > > I must have missed something.
> >
> > What do you think of the patch in
> > https://lore.kernel.org/all/YsyHMVLuT5U6mm+I@netflix/ ? Hopefully that
> > has an explanation that makes more sense.
>
> Sorry, I still do not follow. Again, I can easily miss something. But how
> can ANY change in __fatal_signal_pending() ensure that SIGKILL will wakeup
> a PF_EXITING task which already sleeps in TASK_KILLABLE state? or even set
> TIF_SIGPENDING as the changelog states?

__fatal_signal_pending() just checks the non-shared set:

sigismember(&p->pending.signal, SIGKILL)

When init in a pid namespace dies, it calls zap_pid_ns_processes(),
which does:

group_send_sig_info(SIGKILL, SEND_SIG_PRIV, task, PIDTYPE_MAX);

that eventually gets to __send_signal_locked() which does:

pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;

i.e. it decides to put the signal in the shared set, instead of the individual
set. If we change __fatal_signal_pending() to look in the shared set too, it
will exit all the wait code in this case.

Maybe it should be fixed somehow by complete_signal(), but that doesn't work if
the thread is already PF_EXITING, because wants_signal() will cause it to
ignore the task, so it remains stuck forever.

Does that make sense? Maybe it's me who is missing something. I have a
reproducer here:
https://github.com/tych0/kernel-utils/tree/master/fuse2

Tycho