[PATCH] scsi: ata: Fix a race condition between scsi error handler and ahci interrupt

From: linan666
Date: Wed Aug 09 2023 - 21:52:17 EST


From: Li Nan <linan122@xxxxxxxxxx>

interrupt scsi_eh

ahci_error_intr
=>ata_port_freeze
=>__ata_port_freeze
=>ahci_freeze (turn IRQ off)
=>ata_port_abort
=>ata_port_schedule_eh
=>shost->host_eh_scheduled++;
host_eh_scheduled = 1
scsi_error_handler
=>ata_scsi_error
=>ata_scsi_port_error_handler
=>ahci_error_handler
. =>sata_pmp_error_handler
. =>ata_eh_thaw_port
. =>ahci_thaw (turn IRQ on)
ahci_error_intr .
=>ata_port_freeze .
=>__ata_port_freeze .
=>ahci_freeze (turn IRQ off) .
=>ata_port_abort .
=>ata_port_schedule_eh .
=>shost->host_eh_scheduled++; .
host_eh_scheduled = 2 .
=>ata_std_end_eh
=>host->host_eh_scheduled = 0;

'host_eh_scheduled' is 0 and scsi eh thread will not be scheduled again,
and the ata port remain freeze and will never be enabled.

If EH thread is already running, no need to freeze port and schedule
EH again.

Reported-by: luojian <luojian5@xxxxxxxxxx>
Signed-off-by: Li Nan <linan122@xxxxxxxxxx>
---
drivers/ata/libahci.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index e2bacedf28ef..0dfb0b807324 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -1840,9 +1840,17 @@ static void ahci_error_intr(struct ata_port *ap, u32 irq_stat)

/* okay, let's hand over to EH */

- if (irq_stat & PORT_IRQ_FREEZE)
+ if (irq_stat & PORT_IRQ_FREEZE) {
+ /*
+ * EH already running, this may happen if the port is
+ * thawed in the EH. But we cannot freeze it again
+ * otherwise the port will never be thawed.
+ */
+ if (ap->pflags & (ATA_PFLAG_EH_PENDING |
+ ATA_PFLAG_EH_IN_PROGRESS))
+ return;
ata_port_freeze(ap);
- else if (fbs_need_dec) {
+ } else if (fbs_need_dec) {
ata_link_abort(link);
ahci_fbs_dec_intr(ap);
} else
--
2.39.2