Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

From: Eric W. Biederman
Date: Tue May 16 2023 - 11:57:02 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Mon, May 15, 2023 at 3:23 PM Mike Christie
> <michael.christie@xxxxxxxxxx> wrote:
>>
>> The vhost layer really doesn't want any signals and wants to work like kthreads
>> for that case. To make it really simple can we do something like this where it
>> separates user and io worker behavior where the major diff is how they handle
>> signals and exit. I also included a fix for the freeze case:
>
> I don't love the SIGKILL special case, but I also don't find this
> deeply offensive. So if this is what it takes, I'm ok with it.
>
> I wonder if we could make that special case simply check for "is
> SIGKILL blocked" instead? No normal case will cause that, and it means
> that a PF_USER_WORKER thread could decide per-thread what it wants to
> do wrt SIGKILL.

A kernel thread can block SIGKILL and that is supported.

For a thread that is part of a process you can't block SIGKILL when the
task is part of a user mode process.

There is this bit in complete_signal when SIGKILL is delivered to any
thread in the process.


/*
* Start a group exit and wake everybody up.
* This way we don't have other threads
* running and doing things after a slower
* thread has the fatal signal pending.
*/
signal->flags = SIGNAL_GROUP_EXIT;
signal->group_exit_code = sig;
signal->group_stop_count = 0;
t = p;
do {
task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
sigaddset(&t->pending.signal, SIGKILL);
signal_wake_up(t, 1);
} while_each_thread(p, t);

For clarity that sigaddset(&t->pending.signal, SIGKILL); Really isn't
setting SIGKILL pending, it is part of the short circuit delivery logic,
and that sigaddset(SIGKILL) is just setting a flag to tell the process
it needs to die.


The important part of that code is that SIGNAL_GROUP_EXIT gets set.
That indicates the entire process is being torn down.

Where this becomes important is exit_notify and release_task work
together to ensure that the first thread in the process (a user space
thread that can not block SIGKILL) will not send SIGCHLD to it's parent
process until every thread in the process has exited.

The delay_group_leader logic in wait_consider_task part of wait(2) has
the same logic.

Having been through this with io_uring the threads really need to call
get_signal to handle that case.


This is pretty much why I said at the outset you they needed to decided
if they were going to implement a thread or if they were going to be a
process. Changing the decision to be a thread from a process is fine
but in that case the vhost logic needs to act like a process, just
like io_uring does.


> Christian? And I guess we should Cc: Oleg too, since the signal parts
> are an area he's familiar with and has worked on.. Eric Biederman has
> already been on the list and has also been involved

>
> Oleg: see
>
> https://lore.kernel.org/lkml/122b597e-a5fa-daf7-27bb-6f04fa98d496@xxxxxxxxxx/
>
> for the context here.

Eric