Re: Fwd: Waking up from resume locks up on sr device

From: Damien Le Moal
Date: Wed Jun 14 2023 - 18:44:55 EST


On 6/15/23 03:04, Bart Van Assche wrote:
> On 6/14/23 07:26, Alan Stern wrote:
>> On Wed, Jun 14, 2023 at 04:35:50PM +0900, Damien Le Moal wrote:
>>> Or... Why the heck scsi_rescan_device() is calling device_lock() ? This
>>> is the only place in scsi code I can see that takes this lock. I suspect
>>> this is to serialize either rescans, or serialize with resume, or both.
>>> For serializing rescans, we can use another lock. For serializing with
>>> PM, we should wait for PM transitions...
>>> Something is not right here.
>>
>> Here's what commit e27829dc92e5 ("scsi: serialize ->rescan against
>> ->remove", written by Christoph Hellwig) says:
>>
>> Lock the device embedded in the scsi_device to protect against
>> concurrent calls to ->remove.
>>
>> That's the commit which added the device_lock() call.
>
> Even if scsi_rescan_device() would use another mechanism for
> serialization against sd_remove() and sr_remove(), we still need to
> solve the issue that the ATA code calls scsi_rescan_device() before
> resuming has finished. scsi_rescan_device() issues I/O. Issuing I/O to a
> device is not allowed before that device has been resumed.

I am not convinced of that: scsi suspend quiecse the queue, thus preventing IOs
from the block layer, but not internale scsi ml commands, which is what
scsi_rescan_device() issues.

In any case, I am thinking that best (and quickest) fix for this issue for now
is to have libata define a device link to make the scsi device a "parent" of the
ata device (which is the ata link as of now). This way, PM operation ordering
will ensure that the scsi device resume will be done before the ata device. What
I really do not like about this though is that the suspend side would be done in
the reverse order: ata first and then scsi, but we really want the reverse here
to ensure that the request queue is quiesced before we suspend ata. That said,
there is no such synchronization right now and so this is probably happening
already without raising issues apparently.

So ideally:
1) Make the ata device the parent of the scsi device using a device link
2) For suspend, the scsi device suspend will be done first, followed by the ata
device, which is what we want.
3) For resume, ata device will be first, followed by scsi device. The call to
scsi_rescan_device() from libata being in a work task, asynchronous from the ata
resume context, we need to synchronize that work to wait for the scsi device
resume to complete. (but do we really given that we are going to issue internal
commands only ?)

Alan, Rafael,

For the synchronization of step (3), if I understand the pm code correctly,
using device_pm_wait_for_dev() would work only if async resume is on. This would
be ineffective for the sync case. How can we best deal with this ?


--
Damien Le Moal
Western Digital Research