Re: [PATCH V2] scsi: libsas: Directly kick-off EH when ATA device fell off

From: Damien Le Moal
Date: Mon Dec 19 2022 - 17:59:20 EST


On 12/20/22 00:28, Jason Yan wrote:
> On 2022/12/19 17:23, John Garry wrote:
>> On 16/12/2022 10:03, Xingui Yang wrote:
>>> If the ATA device fell off, call sas_ata_device_link_abort() directly and
>>> mark all outstanding QCs as failed and kick-off EH Immediately. This
>>> avoids
>>> having to wait for block layer timeouts.
>>>
>>> Signed-off-by: Xingui Yang <yangxingui@xxxxxxxxxx>
>>> ---
>>> Changes to v1:
>>> - Use dev_is_sata() to check ATA device type
>>>   drivers/scsi/libsas/sas_discover.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/scsi/libsas/sas_discover.c
>>> b/drivers/scsi/libsas/sas_discover.c
>>> index d5bc1314c341..a12b65eb4a2a 100644
>>> --- a/drivers/scsi/libsas/sas_discover.c
>>> +++ b/drivers/scsi/libsas/sas_discover.c
>>> @@ -362,6 +362,9 @@ static void sas_destruct_ports(struct asd_sas_port
>>> *port)
>>>   void sas_unregister_dev(struct asd_sas_port *port, struct
>>> domain_device *dev)
>>>   {
>>> +    if (test_bit(SAS_DEV_GONE, &dev->state) && dev_is_sata(dev))
>>> +        sas_ata_device_link_abort(dev, false);
>>
>> Firstly, I think that there is a bug in sas_ata_device_link_abort() ->
>> ata_link_abort() code in that the host lock in not grabbed, as the
>> comment in ata_port_abort() mentions. Having said that, libsas had
>> already some dodgy host locking usage - specifically dropping the lock
>> for the queuing path (that's something else to be fixed up ... I think
>
> Taking big locks in queuing path is not a good idea. This will bring
> down performance.

With HDDs ? You will not see any difference (and SATA SSDs are not a thing
anymore, enough that we should worry too much. NVMe took over). And that
"big lock" is libata is really an integral part of the design. To remove
it, you will need to rewrite libata entirely...

>
>
>> that is due to queue command CB calling task_done() in some cases), but
>> I still think that sas_ata_device_link_abort() should be fixed (to grab
>> the host lock).
>
> For sas_ata_device_link_abort(), it should grab ap->lock.

Which is what libata code comments (mistakenly in many places) always
refer as host lock.

>
> Thanks,
> Jason

--
Damien Le Moal
Western Digital Research