Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

From: Christian Brauner
Date: Mon May 08 2023 - 13:13:42 EST


On Fri, May 05, 2023 at 05:37:40PM -0500, Mike Christie wrote:
> On 5/5/23 1:22 PM, Linus Torvalds wrote:
> > On Fri, May 5, 2023 at 6:40 AM Nicolas Dichtel
> > <nicolas.dichtel@xxxxxxxxx> wrote:
> >>
> >> Is this an intended behavior?
> >> This breaks some of our scripts.
> >
> > It doesn't just break your scripts (which counts as a regression), I
> > think it's really wrong.
> >
> > The worker threads should show up as threads of the thing that started
> > them, not as processes.
> >
> > So they should show up in 'ps' only when one of the "show threads" flag is set.
> >
> > But I suspect the fix is trivial: the virtio code should likely use
> > CLONE_THREAD for the copy_process() it does.
> >
> > It should look more like "create_io_thread()" than "copy_process()", I think.
> >
> > For example, do virtio worker threads really want their own signals
> > and files? That sounds wrong. create_io_thread() uses all of
> >
> > CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_IO
> >
> > to share much more of the context with the process it is actually run within.
> >
>
> For the vhost tasks and the CLONE flags:
>
> 1. I didn't use CLONE_FILES in the vhost task patches because you are right
> and we didn't need our own. We needed it to work like kthreads where there
> are no files, so I set the kernel_clone_args.no_files bit to have copy_files
> not do a dup or clone (task->files is NULL).
>
> 2. vhost tasks didn't use CLONE_SIGHAND, because userspace apps like qemu use
> signals for management operations. But, the vhost thread's worker functions
> assume signals are ignored like they were with kthreads. So if they were doing
> IO and got a signal like a SIGHUP they might return early and fail from whatever
> network/block function they were calling. And currently the parent like qemu
> handles something like a SIGSTOP by shutting everything down by calling into
> the vhost interface to remove the device.
>
> So similar to files I used the kernel_clone_args.ignore_signals bit so
> copy_process has the vhost thread have it's own signal handle that just ignores
> signals.
>
> 3. I didn't use CLONE_THREAD because before my patches you could do
> "ps -u root" and see all the vhost threads. If we use CLONE_THREAD, then we
> can only see it when we do something like "ps -T -p $parent" like you mentioned
> above. I guess I messed up and did the reverse and thought it would be a
> regression if "ps -u root" no longer showed the vhost threads.
>
> If it's ok to change the behavior of "ps -u root", then we can do this patch:
> (Nicolas, I confirmed it fixes the 'ps a' case, but couldn't replicate the 'ps'
> case. If you could test the ps only case or give me info on what /usr/bin/example
> was doing I can replicate and test here):
>
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index ed4e01daccaa..eb9ffc58e211 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2269,8 +2269,14 @@ __latent_entropy struct task_struct *copy_process(
> /*
> * Thread groups must share signals as well, and detached threads
> * can only be started up within the thread group.
> + *
> + * A userworker's parent thread will normally have a signal handler
> + * that performs management operations, but the worker will not
> + * because the parent will handle the signal then user a worker
> + * specific interface to manage the thread and related resources.
> */
> - if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
> + if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND) &&
> + !args->user_worker && !args->ignore_signals)
> return ERR_PTR(-EINVAL);

I'm currently traveling due to LSFMM so that's why my responses will be
delayed this week. I'm not yet clear if CLONE_THREAD without
CLONE_SIGHAND is safe. If there's code that assumes that
$task_in_threadgroup->sighand->siglock always covers all threads in the
threadgroup then this change would break this assumption?

>
> /*
> diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
> index b7cbd66f889e..3700c21ea39d 100644
> --- a/kernel/vhost_task.c
> +++ b/kernel/vhost_task.c
> @@ -75,7 +78,8 @@ struct vhost_task *vhost_task_create(int (*fn)(void *), void *arg,
> const char *name)
> {
> struct kernel_clone_args args = {
> - .flags = CLONE_FS | CLONE_UNTRACED | CLONE_VM,
> + .flags = CLONE_FS | CLONE_THREAD | CLONE_VM |
> + CLONE_UNTRACED,
> .exit_signal = 0,
> .fn = vhost_task_fn,
> .name = name,
>
>
>
>
>
>
>
>
>
>