Re: [PATCH net] net/smc: Avoid warning of possible recursive locking

From: Tony Lu
Date: Mon Nov 22 2021 - 07:40:03 EST


On Mon, Nov 22, 2021 at 08:32:53PM +0800, Wen Gu wrote:
> Possible recursive locking is detected by lockdep when SMC
> falls back to TCP. The corresponding warnings are as follows:
>
> ============================================
> WARNING: possible recursive locking detected
> 5.16.0-rc1+ #18 Tainted: G E
> --------------------------------------------
> wrk/1391 is trying to acquire lock:
> ffff975246c8e7d8 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0x109/0x250 [smc]
>
> but task is already holding lock:
> ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc]
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&ei->socket.wq.wait);
> lock(&ei->socket.wq.wait);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 2 locks held by wrk/1391:
> #0: ffff975246040130 (sk_lock-AF_SMC){+.+.}-{0:0}, at: smc_connect+0x43/0x150 [smc]
> #1: ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc]
>
> stack backtrace:
> Call Trace:
> <TASK>
> dump_stack_lvl+0x56/0x7b
> __lock_acquire+0x951/0x11f0
> lock_acquire+0x27a/0x320
> ? smc_switch_to_fallback+0x109/0x250 [smc]
> ? smc_switch_to_fallback+0xfe/0x250 [smc]
> _raw_spin_lock_irq+0x3b/0x80
> ? smc_switch_to_fallback+0x109/0x250 [smc]
> smc_switch_to_fallback+0x109/0x250 [smc]
> smc_connect_fallback+0xe/0x30 [smc]
> __smc_connect+0xcf/0x1090 [smc]
> ? mark_held_locks+0x61/0x80
> ? __local_bh_enable_ip+0x77/0xe0
> ? lockdep_hardirqs_on+0xbf/0x130
> ? smc_connect+0x12a/0x150 [smc]
> smc_connect+0x12a/0x150 [smc]
> __sys_connect+0x8a/0xc0
> ? syscall_enter_from_user_mode+0x20/0x70
> __x64_sys_connect+0x16/0x20
> do_syscall_64+0x34/0x90
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> The nested locking in smc_switch_to_fallback() is considered to
> possibly cause a deadlock because smc_wait->lock and clc_wait->lock
> are the same type of lock. But actually it is safe so far since
> there is no other place trying to obtain smc_wait->lock when
> clc_wait->lock is held. So the patch replaces spin_lock() with
> spin_lock_nested() to avoid false report by lockdep.
>
> Link: https://lkml.org/lkml/2021/11/19/962
> Fixes: 2153bd1e3d3d ("Transfer remaining wait queue entries during fallback")
> Reported-by: syzbot+e979d3597f48262cb4ee@xxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Wen Gu <guwen@xxxxxxxxxxxxxxxxx>

Acked-by: Tony Lu <tonylu@xxxxxxxxxxxxxxxxx>

> ---
> net/smc/af_smc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index b61c802..2692cba 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -585,7 +585,7 @@ static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
> * to clcsocket->wq during the fallback.
> */
> spin_lock_irqsave(&smc_wait->lock, flags);
> - spin_lock(&clc_wait->lock);
> + spin_lock_nested(&clc_wait->lock, SINGLE_DEPTH_NESTING);
> list_splice_init(&smc_wait->head, &clc_wait->head);
> spin_unlock(&clc_wait->lock);
> spin_unlock_irqrestore(&smc_wait->lock, flags);
> --
> 1.8.3.1