Re: [RFC PATCH 0/8] vhost_tasks: Use CLONE_THREAD/SIGHAND

From: Christian Brauner
Date: Thu May 18 2023 - 10:30:25 EST


On Thu, May 18, 2023 at 10:25:11AM +0200, Christian Brauner wrote:
> On Wed, May 17, 2023 at 07:09:12PM -0500, Mike Christie wrote:
> > This patch allows the vhost and vhost_task code to use CLONE_THREAD,
> > CLONE_SIGHAND and CLONE_FILES. It's a RFC because I didn't do all the
> > normal testing, haven't coverted vsock and vdpa, and I know you guys
> > will not like the first patch. However, I think it better shows what
>
> Just to summarize the core idea behind my proposal is that no signal
> handling changes are needed unless there's a bug in the current way
> io_uring workers already work. All that should be needed is
> s/PF_IO_WORKER/PF_USER_WORKER/ in signal.c.
>
> If you follow my proposal than vhost and io_uring workers should almost
> collapse into the same concept. Specifically, io_uring workers and vhost
> workers should behave the same when it comes ot handling signals.
>
> See
> https://lore.kernel.org/lkml/20230518-kontakt-geduckt-25bab595f503@brauner
>
>
> > we need from the signal code and how we can support signals in the
> > vhost_task layer.
> >
> > Note that I took the super simple route and kicked off some work to
> > the system workqueue. We can do more invassive approaches:
> > 1. Modify the vhost drivers so they can check for IO completions using
> > a non-blocking interface. We then don't need to run from the system
> > workqueue and can run from the vhost_task.
> >
> > 2. We could drop patch 1 and just say we are doing a polling type
> > of approach. We then modify the vhost layer similar to #1 where we
> > can check for completions using a non-blocking interface and use
> > the vhost_task task.
>
> My preference would be to do whatever is the minimal thing now and has
> the least bug potential and is the easiest to review for us non-vhost
> experts. Then you can take all the time to rework and improve the vhost
> infra based on the possibilities that using user workers offers. Plus,
> that can easily happen in the next kernel cycle.
>
> Remember, that we're trying to fix a regression here. A regression on an
> unreleased kernel but still.

Just two more thoughts:

The following places currently check for PF_IO_WORKER:

arch/x86/include/asm/fpu/sched.h: !(current->flags & (PF_KTHREAD | PF_IO_WORKER))) {
arch/x86/kernel/fpu/context.h: if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_IO_WORKER)))
arch/x86/kernel/fpu/core.c: if (!(current->flags & (PF_KTHREAD | PF_IO_WORKER)) &&

Both PF_KTHREAD and PF_IO_WORKER don't need TIF_NEED_FPU_LOAD because
they never return to userspace. But that's not specific to
PF_IO_WORKERs. Please generalize this to just check for PF_USER_WORKER
via a simple s/PF_IO_WORKER/PF_USER_WORKER/g in these places.

Another thing, in the sched code we have hooks into sched_submit_work()
and sched_update_worker() specific to PF_IO_WORKERs. But again, I don't
think this needs to be special to PF_IO_WORKERS. This might be
generally useful for PF_USER_WORKER. So we should probably generalize
this and have a generic user_worker_sleeping() and user_worker_running()
helper that figures out internally what specific helper to call. That's
not something that needs to be done right now though since I don't think
vhost needs this functionality.

But we should generalize this for the next development cycle so we have
this all nice and clean when someone actually needs this. Overall this
will mean that there would only be a single place left where
PF_IO_WORKER would need to be checked and that's in io_uring code
itself. And if we do things just right we might not even need that
PF_IO_WORKER flag anymore at all. But again, that's just notes for next
cycle.

Thoughts? Rotten apples?