Re: [PATCH 5/5] task_work: use TIF_NOTIFY_SIGNAL if available

From: Thomas Gleixner
Date: Fri Oct 16 2020 - 05:00:27 EST


On Thu, Oct 15 2020 at 12:39, Jens Axboe wrote:
> On 10/15/20 9:49 AM, Oleg Nesterov wrote:
>> You can simply nack the patch which adds TIF_NOTIFY_SIGNAL to
>> arch/xxx/include/asm/thread_info.h.

As if that is going to change anything...

> This seems to be the biggest area of contention right now. Just to
> summarize, we have two options:
>
> 1) We leave the CONFIG_GENERIC_ENTRY requirement, which means that the
> rest of the cleanups otherwise enabled by this series will not be
> able to move forward until the very last arch is converted to the
> generic entry code.
>
> 2) We go back to NOT having the CONFIG_GENERIC_ENTRY requirement, and
> archs can easily opt-in to TIF_NOTIFY_SIGNAL independently of
> switching to the generic entry code.
>
> I understand Thomas's reasoning in wanting to push archs towards the
> generic entry code, and I fully support that. However, it does seem like
> the road paved by #1 is long and potentially neverending, which would
> leave us with never being able to kill the various bits of code that we
> otherwise would be able to.
>
> Thomas, I do agree with Oleg on this one, I think we can make quicker
> progress on cleanups with option #2. This isn't really going to hinder
> any arch conversion to the generic entry code, as arch patches would be
> funeled through the arch trees anyway.

I completely understand the desire to remove the jobctl mess and it
looks like a valuable cleanup on it's own.

But I fundamentaly disagree with the wording of #2:

'and archs can easily opt-in ....'

Just doing it on an opt-in base is not any different from making it
dependent on CONFIG_GENERIC_ENTRY. It's just painted differently and
makes it easy for you to bring the performance improvement to the less
than a handful architectures which actually care about io_uring.

So if you change #2 to:

Drop the CONFIG_GENERIC_ENTRY dependency, make _all_ architectures
use TIF_NOTIFY_SIGNAL and clean up the jobctl and whatever related
mess.

and actually act apon it, then I'm fine with that approach.

Anything else is just proliferating the existing mess and yet another
promise of great improvements which never materialize.

Just to prove my point:

e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()")

added TWA_SIGNAL in June with the following in the changelog:

TODO: once this patch is merged we need to change all current users
of task_work_add(notify = true) to use TWA_RESUME.

Now let's look at reality:

arch/x86/kernel/cpu/mce/core.c: task_work_add(current, &current->mce_kill_me, true);
arch/x86/kernel/cpu/resctrl/rdtgroup.c: ret = task_work_add(tsk, &callback->work, true);
drivers/acpi/apei/ghes.c: ret = task_work_add(current, &estatus_node->task_work,
drivers/acpi/apei/ghes.c- true);
drivers/android/binder.c: task_work_add(current, &twcb->twork, true);
fs/file_table.c: if (!task_work_add(task, &file->f_u.fu_rcuhead, true))

fs/io_uring.c: ret = task_work_add(req->task, &req->task_work, TWA_RESUME);

fs/io_uring.c: task_work_add(tsk, &req->task_work, 0);

fs/io_uring.c- notify = 0;
fs/io_uring.c- if (!(ctx->flags & IORING_SETUP_SQPOLL) && twa_signal_ok)
fs/io_uring.c- notify = TWA_SIGNAL;
fs/io_uring.c-
fs/io_uring.c: ret = task_work_add(tsk, &req->task_work, notify);

fs/io_uring.c: task_work_add(tsk, &req->task_work, 0);

fs/io_uring.c: task_work_add(tsk, &req->task_work, 0);
fs/io_uring.c: task_work_add(tsk, &req->task_work, 0);

fs/namespace.c: if (!task_work_add(task, &mnt->mnt_rcu, true))

kernel/events/uprobes.c: task_work_add(t, &t->utask->dup_xol_work, true);

kernel/irq/manage.c: task_work_add(current, &on_exit_work, false);

kernel/sched/fair.c: task_work_add(curr, work, true);

kernel/task_work.c:task_work_add(struct task_struct *task, struct callback_head *work, int notify)
kernel/time/posix-cpu-timers.c: task_work_add(tsk, &tsk->posix_cputimers_work.work, TWA_RESUME);

security/keys/keyctl.c: ret = task_work_add(parent, newwork, true);

security/yama/yama_lsm.c: if (task_work_add(current, &info->work, true) == 0)

See? Adding TODO's and making promises about cleanups is easy.

The patch adding this is sloppy at best. Instead of using a named enum
it just defines TWA_RESUME and TWA_SIGNAL.

That makes the code really readable:

notify = 0;
if (cond)
notify = TWA_SIGNAL;

Making that

enum task_work_notify_mode {
TWA_NONE,
TWA_RESUME,
TWA_SIGNAL,
};

would have been not convoluted enough, right?

Also the kernel documentation of task_work_add() is outdated and
partially wrong. Can be fixed later as well, right?

This features first and let others deal with the mess we create mindset
has to stop. I'm dead tired of it.

Thanks,

tglx