Re: [PATCH v2] kprobes: Use synchronize_rcu_tasks_rude in kprobe_optimizer

From: Paul E. McKenney
Date: Fri Jan 19 2024 - 05:55:05 EST


On Thu, Jan 18, 2024 at 06:44:54AM -0800, Paul E. McKenney wrote:
> On Wed, Jan 17, 2024 at 09:26:46PM -0500, Steven Rostedt wrote:
> > On Thu, 18 Jan 2024 02:18:42 +0000
> > Chen Zhongjin <chenzhongjin@xxxxxxxxxx> wrote:
> >
> > > There is a deadlock scenario in kprobe_optimizer():
> > >
> > > pid A pid B pid C
> > > kprobe_optimizer() do_exit() perf_kprobe_init()
> > > mutex_lock(&kprobe_mutex) exit_tasks_rcu_start() mutex_lock(&kprobe_mutex)
> > > synchronize_rcu_tasks() zap_pid_ns_processes() // waiting kprobe_mutex
> > > // waiting tasks_rcu_exit_srcu kernel_wait4()
> > > // waiting pid C exit
> > >
> > > To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in kprobe_optimizer()
> > > rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also promise
> > > that all preempted tasks have scheduled, but it will not wait tasks_rcu_exit_srcu.
> > >
> >
> > Did lockdep detect this? If not, we should fix that.
> >
> > I'm also thinking if we should find another solution, as this seems more of
> > a work around than a fix.
>
> My suggestion is at 526b12e4-4bb0-47b1-bece-66b47bfc0a92@paulmck-laptop.
>
> Better suggestions are of course welcome. ;-)
>
> > > Fixes: a30b85df7d59 ("kprobes: Use synchronize_rcu_tasks() for optprobe with CONFIG_PREEMPT=y")
> > > Signed-off-by: Chen Zhongjin <chenzhongjin@xxxxxxxxxx>
> > > ---
> > > v1 -> v2: Add Fixes tag
> > > ---
> > > arch/Kconfig | 2 +-
> > > kernel/kprobes.c | 2 +-
> > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/Kconfig b/arch/Kconfig
> > > index f4b210ab0612..dc6a18854017 100644
> > > --- a/arch/Kconfig
> > > +++ b/arch/Kconfig
> > > @@ -104,7 +104,7 @@ config STATIC_CALL_SELFTEST
> > > config OPTPROBES
> > > def_bool y
> > > depends on KPROBES && HAVE_OPTPROBES
> > > - select TASKS_RCU if PREEMPTION
> > > + select TASKS_RUDE_RCU
> >
> > Is this still a bug if PREEMPTION is not enabled?
>
> Both "select" clauses would be needed for this patch, if I understand
> correctly.
>
> Thanx, Paul
>
> > -- Steve
> >
> > >
> > > config KPROBES_ON_FTRACE
> > > def_bool y
> > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > > index d5a0ee40bf66..09056ae50c58 100644
> > > --- a/kernel/kprobes.c
> > > +++ b/kernel/kprobes.c
> > > @@ -623,7 +623,7 @@ static void kprobe_optimizer(struct work_struct *work)
> > > * Note that on non-preemptive kernel, this is transparently converted
> > > * to synchronoze_sched() to wait for all interrupts to have completed.
> > > */
> > > - synchronize_rcu_tasks();
> > > + synchronize_rcu_tasks_rude();

The full comment reads as follows:

/*
* Step 2: Wait for quiesence period to ensure all potentially
* preempted tasks to have normally scheduled. Because optprobe
* may modify multiple instructions, there is a chance that Nth
* instruction is preempted. In that case, such tasks can return
* to 2nd-Nth byte of jump instruction. This wait is for avoiding it.
* Note that on non-preemptive kernel, this is transparently converted
* to synchronoze_sched() to wait for all interrupts to have completed.
*/

Except that synchronize_rcu_tasks_rude() isn't going to wait for any
preempted tasks. It instead waits only for regions of code that have
disabled preemption (or interrrupts or ...). So either the above comment
is wrong and needs to be fixed, or this change breaks kprobes. Last
I knew, the comment was correct.

So I still like the idea of using non-preemptability to wait for tasks
near the end of do_exit(), but unless I am missing something, this patch
as written is inserting a rare but real bug.

Steve,thoughts?

Thanx, Paul

> > > /* Step 3: Optimize kprobes after quiesence period */
> > > do_optimize_kprobes();
> >