Re: [PATCH V2] scsi: libsas: Directly kick-off EH when ATA device fell off

From: John Garry
Date: Mon Dec 19 2022 - 10:55:47 EST


On 19/12/2022 15:28, Jason Yan wrote:
+    if (test_bit(SAS_DEV_GONE, &dev->state) && dev_is_sata(dev))
+        sas_ata_device_link_abort(dev, false);

Firstly, I think that there is a bug in sas_ata_device_link_abort() -> ata_link_abort() code in that the host lock in not grabbed, as the comment in ata_port_abort() mentions. Having said that, libsas had already some dodgy host locking usage - specifically dropping the lock for the queuing path (that's something else to be fixed up ... I think

Taking big locks in queuing path is not a good idea. This will bring down performance.

But it is expected that ata_qc_issue() should be called with that the host lock grabbed (and keep it).

I think that the reason libsas drops the lock is because some LLDD queuecommand CBs calls task_done() in some error paths. If we kept the lock held, then we could have a deadlock, for example:

sas_ata_qc_issue (has lock) -> lldd_execute_task() = pm8001_queue_command() -> task_done() = sas_ata_task_done() -> grab host lock => deadlock.

Thanks,
John