Re: [PATCH] scsi: libsas: Grab the host lock in sas_ata_device_link_abort()

From: Jason Yan
Date: Wed Dec 21 2022 - 01:35:28 EST


On 2022/12/21 11:59, Damien Le Moal wrote:
On 2022/12/21 11:42, Jason Yan wrote:
On 2022/12/21 8:36, Damien Le Moal wrote:
On 2022/12/20 23:59, John Garry wrote:
On 20/12/2022 12:53, Xingui Yang wrote:
Grab the host lock in sas_ata_device_link_abort() before calling

This is should be the ata port lock, right? I know that the ata comments
say differently.

ata_link_abort(), as the comment in ata_link_abort() mentions.


Can you please add a fixes tag?

Signed-off-by: Xingui Yang <yangxingui@xxxxxxxxxx>

Reviewed-by: John Garry <john.g.garry@xxxxxxxxxx>

---
drivers/scsi/libsas/sas_ata.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index f7439bf9cdc6..4f2017b21e6d 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -889,7 +889,9 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset)
{
struct ata_port *ap = device->sata_dev.ap;
struct ata_link *link = &ap->link;
+ unsigned long flags;
+ spin_lock_irqsave(ap->lock, flags);
device->sata_dev.fis[2] = ATA_ERR | ATA_DRDY; /* tf status */
device->sata_dev.fis[3] = ATA_ABORTED; /* tf error */
@@ -897,6 +899,7 @@ void sas_ata_device_link_abort(struct domain_device *device, bool force_reset)
if (force_reset)
link->eh_info.action |= ATA_EH_RESET;
ata_link_abort(link);

Really need to add lockdep annotations in libata to avoid/catch such bugs...
Will work on that.

Actually in libata there are many places calling ata_link_abort() not
holding port lock. And some places are holding the real host
lock(ata_host->lock) while calling ata_link_abort(). So if you add the
lockdep annotations, there may be too many warnings. If these are real
issues, we should fix them first.

libata-EH does most of its work without the port lock held because by the time
we get EH started, we are guaranteed to be idle with no commands in flight. That
is why the calls you mention look like "bugs" but are not.

What about the interrupt handler such as ahci_error_intr()? I didn't see the callers hold the port lock too. Do they need the port lock?