Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

From: Mike Christie
Date: Fri May 05 2023 - 18:38:10 EST


On 5/5/23 1:22 PM, Linus Torvalds wrote:
> On Fri, May 5, 2023 at 6:40 AM Nicolas Dichtel
> <nicolas.dichtel@xxxxxxxxx> wrote:
>>
>> Is this an intended behavior?
>> This breaks some of our scripts.
>
> It doesn't just break your scripts (which counts as a regression), I
> think it's really wrong.
>
> The worker threads should show up as threads of the thing that started
> them, not as processes.
>
> So they should show up in 'ps' only when one of the "show threads" flag is set.
>
> But I suspect the fix is trivial: the virtio code should likely use
> CLONE_THREAD for the copy_process() it does.
>
> It should look more like "create_io_thread()" than "copy_process()", I think.
>
> For example, do virtio worker threads really want their own signals
> and files? That sounds wrong. create_io_thread() uses all of
>
> CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_IO
>
> to share much more of the context with the process it is actually run within.
>

For the vhost tasks and the CLONE flags:

1. I didn't use CLONE_FILES in the vhost task patches because you are right
and we didn't need our own. We needed it to work like kthreads where there
are no files, so I set the kernel_clone_args.no_files bit to have copy_files
not do a dup or clone (task->files is NULL).

2. vhost tasks didn't use CLONE_SIGHAND, because userspace apps like qemu use
signals for management operations. But, the vhost thread's worker functions
assume signals are ignored like they were with kthreads. So if they were doing
IO and got a signal like a SIGHUP they might return early and fail from whatever
network/block function they were calling. And currently the parent like qemu
handles something like a SIGSTOP by shutting everything down by calling into
the vhost interface to remove the device.

So similar to files I used the kernel_clone_args.ignore_signals bit so
copy_process has the vhost thread have it's own signal handle that just ignores
signals.

3. I didn't use CLONE_THREAD because before my patches you could do
"ps -u root" and see all the vhost threads. If we use CLONE_THREAD, then we
can only see it when we do something like "ps -T -p $parent" like you mentioned
above. I guess I messed up and did the reverse and thought it would be a
regression if "ps -u root" no longer showed the vhost threads.

If it's ok to change the behavior of "ps -u root", then we can do this patch:
(Nicolas, I confirmed it fixes the 'ps a' case, but couldn't replicate the 'ps'
case. If you could test the ps only case or give me info on what /usr/bin/example
was doing I can replicate and test here):


diff --git a/kernel/fork.c b/kernel/fork.c
index ed4e01daccaa..eb9ffc58e211 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2269,8 +2269,14 @@ __latent_entropy struct task_struct *copy_process(
/*
* Thread groups must share signals as well, and detached threads
* can only be started up within the thread group.
+ *
+ * A userworker's parent thread will normally have a signal handler
+ * that performs management operations, but the worker will not
+ * because the parent will handle the signal then user a worker
+ * specific interface to manage the thread and related resources.
*/
- if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
+ if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND) &&
+ !args->user_worker && !args->ignore_signals)
return ERR_PTR(-EINVAL);

/*
diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
index b7cbd66f889e..3700c21ea39d 100644
--- a/kernel/vhost_task.c
+++ b/kernel/vhost_task.c
@@ -75,7 +78,8 @@ struct vhost_task *vhost_task_create(int (*fn)(void *), void *arg,
const char *name)
{
struct kernel_clone_args args = {
- .flags = CLONE_FS | CLONE_UNTRACED | CLONE_VM,
+ .flags = CLONE_FS | CLONE_THREAD | CLONE_VM |
+ CLONE_UNTRACED,
.exit_signal = 0,
.fn = vhost_task_fn,
.name = name,