Re: [PATCH RESEND] scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock

From: Hannes Reinecke
Date: Wed Aug 16 2023 - 13:10:48 EST


On 8/16/23 17:55, Chengfeng Ye wrote:
There is a long call chain that &fip->ctlr_lock is acquired by isr
fnic_isr_msix_wq_copy() under hard irq context. Thus other process
context code acquiring the lock should disable irq, otherwise
deadlock could happen if the irq preempt the execution while the
lock is held in process context on the same CPU.

[ISR]
fnic_isr_msix_wq_copy()
-> fnic_wq_copy_cmpl_handler()
-> fnic_fcpio_cmpl_handler()
-> fnic_fcpio_flogi_reg_cmpl_handler()
-> fnic_flush_tx()
-> fnic_send_frame()
-> fcoe_ctlr_els_send()
-> spin_lock_bh(&fip->ctlr_lock)

[Process Context]
1. fcoe_ctlr_timer_work()
-> fcoe_ctlr_flogi_send()
-> spin_lock_bh(&fip->ctlr_lock)

2. fcoe_ctlr_recv_work()
-> fcoe_ctlr_recv_handler()
-> fcoe_ctlr_recv_els()
-> fcoe_ctlr_announce()
-> spin_lock_bh(&fip->ctlr_lock)

3. fcoe_ctlr_recv_work()
-> fcoe_ctlr_recv_handler()
-> fcoe_ctlr_recv_els()
-> fcoe_ctlr_flogi_retry()
-> spin_lock_bh(&fip->ctlr_lock)

4. -> fcoe_xmit()
-> fcoe_ctlr_els_send()
-> spin_lock_bh(&fip->ctlr_lock)

spin_lock_bh() is not enough since fnic_isr_msix_wq_copy() is a
hardirq.

These flaws were found by an experimental static analysis tool I am
developing for irq-related deadlock.

The patch fix the potential deadlocks by spin_lock_irqsave() to
disable hard irq.

Signed-off-by: Chengfeng Ye <dg573847474@xxxxxxxxx>
---
drivers/scsi/fcoe/fcoe_ctlr.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

Reviewed-by: Hannes Reinecke <hare@xxxxxxx>

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman