Re: [PATCH v2] kprobes: Use synchronize_rcu_tasks_rude in kprobe_optimizer

From: Paul E. McKenney
Date: Fri Jan 19 2024 - 09:37:36 EST


On Thu, Jan 18, 2024 at 02:18:42AM +0000, Chen Zhongjin wrote:
> There is a deadlock scenario in kprobe_optimizer():
>
> pid A pid B pid C
> kprobe_optimizer() do_exit() perf_kprobe_init()
> mutex_lock(&kprobe_mutex) exit_tasks_rcu_start() mutex_lock(&kprobe_mutex)
> synchronize_rcu_tasks() zap_pid_ns_processes() // waiting kprobe_mutex
> // waiting tasks_rcu_exit_srcu kernel_wait4()
> // waiting pid C exit
>
> To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in kprobe_optimizer()
> rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also promise
> that all preempted tasks have scheduled, but it will not wait tasks_rcu_exit_srcu.
>
> Fixes: a30b85df7d59 ("kprobes: Use synchronize_rcu_tasks() for optprobe with CONFIG_PREEMPT=y")
> Signed-off-by: Chen Zhongjin <chenzhongjin@xxxxxxxxxx>

Just so you know, your email ends up in gmail's spam folder. :-/

> ---
> v1 -> v2: Add Fixes tag
> ---
> arch/Kconfig | 2 +-
> kernel/kprobes.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index f4b210ab0612..dc6a18854017 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -104,7 +104,7 @@ config STATIC_CALL_SELFTEST
> config OPTPROBES
> def_bool y
> depends on KPROBES && HAVE_OPTPROBES
> - select TASKS_RCU if PREEMPTION
> + select TASKS_RUDE_RCU
>
> config KPROBES_ON_FTRACE
> def_bool y
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index d5a0ee40bf66..09056ae50c58 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -623,7 +623,7 @@ static void kprobe_optimizer(struct work_struct *work)
> * Note that on non-preemptive kernel, this is transparently converted
> * to synchronoze_sched() to wait for all interrupts to have completed.
> */
> - synchronize_rcu_tasks();
> + synchronize_rcu_tasks_rude();

Again, that comment reads in full as follows:

/*
* Step 2: Wait for quiesence period to ensure all potentially
* preempted tasks to have normally scheduled. Because optprobe
* may modify multiple instructions, there is a chance that Nth
* instruction is preempted. In that case, such tasks can return
* to 2nd-Nth byte of jump instruction. This wait is for avoiding it.
* Note that on non-preemptive kernel, this is transparently converted
* to synchronoze_sched() to wait for all interrupts to have completed.
*/

Please note well that first sentence.

Unless that first sentence no longer holds, this patch cannot work
because synchronize_rcu_tasks_rude() will not (repeat, NOT) wait for
preempted tasks.

So how to safely break this deadlock? Reproducing Chen Zhongjin's
diagram:

pid A pid B pid C
kprobe_optimizer() do_exit() perf_kprobe_init()
mutex_lock(&kprobe_mutex) exit_tasks_rcu_start() mutex_lock(&kprobe_mutex)
synchronize_rcu_tasks() zap_pid_ns_processes() // waiting kprobe_mutex
// waiting tasks_rcu_exit_srcu kernel_wait4()
// waiting pid C exit

We need to stop synchronize_rcu_tasks() from waiting on tasks like
pid B that are voluntarily blocked. One way to do that is to replace
SRCU with a set of per-CPU lists. Then exit_tasks_rcu_start() adds the
current task to this list and does ...

OK, this is getting a bit involved. If you would like to follow along,
please feel free to look here:

https://docs.google.com/document/d/1MEHHs5qbbZBzhN8dGP17pt-d87WptFJ2ZQcqS221d9I/edit?usp=sharing

Thanx, Paul

> /* Step 3: Optimize kprobes after quiesence period */
> do_optimize_kprobes();
> --
> 2.25.1
>