Re: [PATCH] sched: __fatal_signal_pending() should also check PF_EXITING

From: Oleg Nesterov
Date: Thu Jul 28 2022 - 05:12:46 EST


On 07/27, Tycho Andersen wrote:
>
> On Wed, Jul 27, 2022 at 09:19:50PM +0200, Oleg Nesterov wrote:
> >
> > Sorry, I still do not follow. Again, I can easily miss something. But how
> > can ANY change in __fatal_signal_pending() ensure that SIGKILL will wakeup
> > a PF_EXITING task which already sleeps in TASK_KILLABLE state? or even set
> > TIF_SIGPENDING as the changelog states?
>
> __fatal_signal_pending() just checks the non-shared set:
>
> sigismember(&p->pending.signal, SIGKILL)
>
> When init in a pid namespace dies, it calls zap_pid_ns_processes(),
> which does:
>
> group_send_sig_info(SIGKILL, SEND_SIG_PRIV, task, PIDTYPE_MAX);
>
> that eventually gets to __send_signal_locked() which does:
>
> pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
>
> i.e. it decides to put the signal in the shared set, instead of the individual
> set. If we change __fatal_signal_pending() to look in the shared set too, it
> will exit all the wait code in this case.

This is clear, but it seems you do not understand me. Let me try again
to explain and please correct me if I am wrong.

To simplify, lets suppose we have a single-thread task T which simply
does
__set_current_state(TASK_KILLABLE);
schedule();

in the do_exit() paths after exit_signals() which sets PF_EXITING. Btw,
note that it even documents that this thread is not "visible" for the
group-wide signals, see below.

Now, suppose that this task is running and you send SIGKILL. T will
dequeue SIGKILL from T->penging and call do_exit(). However, it won't
remove SIGKILL from T->signal.shared_pending(), and this means that
signal_pending(T) is still true.

Now. If we add a PF_EXITING or sigismember(shared_pending, SIGKILL) check
into __fatal_signal_pending(), then yes, T won't block in schedule(),
schedule()->signal_pending_state() will return true.

But what if T exits on its own? It will block in schedule() forever.
schedule()->signal_pending_state() will not even check __fatal_signal_pending(),
signal_pending() == F.

Now if you send SIGKILL to this task, SIGKILL won't wake it up or even
set TIF_SIGPENDING, complete_signal() will do nothing.

See?

I agree, we should probably cleanup this logic and define how exactly
the exiting task should react to signals (not only fatal signals). But
your patch certainly doesn't look good to me and it is not enough.
May be we can change get_signal() to not remove SIGKILL from t->pending
for the start... not sure, this needs another discussion.


Finally. if fuse_flush() wants __fatal_signal_pending() == T when the
caller exits, perhaps it can do it itself? Something like

if (current->flags & PF_EXITING) {
spin_lock_irq(siglock);
set_thread_flag(TIF_SIGPENDING);
sigaddset(&current->pending.signal, SIGKILL);
spin_unlock_irq(siglock);
}

Sure, this is ugly as hell. But perhaps this can serve as a workaround?

Oleg.