Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

From: Masami Hiramatsu
Date: Thu Apr 09 2020 - 10:41:08 EST


On Wed, 8 Apr 2020 18:46:41 +0200
Jiri Olsa <jolsa@xxxxxxxxxx> wrote:

> hi,
> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> My test was also able to trigger lockdep output:
>
> ============================================
> WARNING: possible recursive locking detected
> 5.6.0-rc6+ #6 Not tainted
> --------------------------------------------
> sched-messaging/2767 is trying to acquire lock:
> ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
>
> but task is already holding lock:
> ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&(kretprobe_table_locks[i].lock));
> lock(&(kretprobe_table_locks[i].lock));
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 1 lock held by sched-messaging/2767:
> #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>
> stack backtrace:
> CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
> Call Trace:
> dump_stack+0x96/0xe0
> __lock_acquire.cold.57+0x173/0x2b7
> ? native_queued_spin_lock_slowpath+0x42b/0x9e0
> ? lockdep_hardirqs_on+0x590/0x590
> ? __lock_acquire+0xf63/0x4030
> lock_acquire+0x15a/0x3d0
> ? kretprobe_hash_lock+0x52/0xa0
> _raw_spin_lock_irqsave+0x36/0x70
> ? kretprobe_hash_lock+0x52/0xa0
> kretprobe_hash_lock+0x52/0xa0
> trampoline_handler+0xf8/0x940
> ? kprobe_fault_handler+0x380/0x380
> ? find_held_lock+0x3a/0x1c0
> kretprobe_trampoline+0x25/0x50
> ? lock_acquired+0x392/0xbc0
> ? _raw_spin_lock_irqsave+0x50/0x70
> ? __get_valid_kprobe+0x1f0/0x1f0
> ? _raw_spin_unlock_irqrestore+0x3b/0x40
> ? finish_task_switch+0x4b9/0x6d0
> ? __switch_to_asm+0x34/0x70
> ? __switch_to_asm+0x40/0x70
>
> The code within the kretprobe handler checks for probe reentrancy,
> so we won't trigger any _raw_spin_lock_irqsave probe in there.
>
> The problem is in outside kprobe_flush_task, where we call:
>
> kprobe_flush_task
> kretprobe_table_lock
> raw_spin_lock_irqsave
> _raw_spin_lock_irqsave
>
> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
>
> The kretprobe_trampoline handler is then executed with already
> locked kretprobe_table_locks, and first thing it does is to
> lock kretprobe_table_locks ;-) the whole lockup path like:
>
> kprobe_flush_task
> kretprobe_table_lock
> raw_spin_lock_irqsave
> _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
>
> ---> kretprobe_table_locks locked
>
> kretprobe_trampoline
> trampoline_handler
> kretprobe_hash_lock(current, &head, &flags); <--- deadlock
>
> The change below sets current_kprobe in kprobe_flush_task, so the probe
> recursion protection check is hit and the probe is never set. It seems
> to fix the deadlock.
>
> I'm not sure this is the best fix, any ideas are welcome ;-)

OK, I just have 1 comment. :)

>
> thanks,
> jirka
>
>
> ---
> kernel/kprobes.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 2625c241ac00..b13247cae752 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
> }
> NOKPROBE_SYMBOL(kretprobe_table_unlock);
>
> +static struct kprobe kretprobe_dummy = {
> + .addr = (void *)kretprobe_trampoline,
> +};
> +
> /*
> * This function is called from finish_task_switch when task tk becomes dead,
> * so that we can recycle any function-return probe instances associated
> @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
> INIT_HLIST_HEAD(&empty_rp);
> hash = hash_ptr(tk, KPROBE_HASH_BITS);
> head = &kretprobe_inst_table[hash];
> + __this_cpu_write(current_kprobe, &kretprobe_dummy);

Can you also set the kcb->kprobe_state = KPROBE_HIT_ACTIVE?

BTW, we may be better to introduce a common kprobe_reject_section_start()
and kprobe_reject_section_end() so that the user don't need to prepare
dummy kprobes.

Thank you,

> kretprobe_table_lock(hash, &flags);
> hlist_for_each_entry_safe(ri, tmp, head, hlist) {
> if (ri->task == tk)
> recycle_rp_inst(ri, &empty_rp);
> }
> kretprobe_table_unlock(hash, &flags);
> + __this_cpu_write(current_kprobe, NULL);
> hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
> hlist_del(&ri->hlist);
> kfree(ri);
> --
> 2.25.2
>


--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>