On 6/11/23 00:03, Bart Van Assche wrote:Which is my feeling, too.
On 6/10/23 06:27, Bagas Sanjaya wrote:
On 6/10/23 15:55, Pavel Machek wrote:
Maybe cc the authors of that commit?#regzbot introduced: v5.0..v6.4-rc5 https://bugzilla.kernel.org/show_bug.cgi?id=217530The reporter had found the culprit (via bisection), so:
#regzbot title: Waking up from resume locks up on SCSI CD/DVD drive
#regzbot introduced: a19a93e4c6a98c
Ah! I forgot to do that! Thanks anyway.
Hi Damien,
Why does the ATA code call scsi_rescan_device() before system resume has
finished? Would ATA devices still work with the patch below applied?
I do not know the PM code well at all, need to dig into it. But your patch
worries me as it seems it would prevent rescan of the device on a resume, which
can be an issue if the device has changed.
I am not yet 100% clear on the root cause for this, but I think it comes from
the fact that ata_port_pm_resume() runs before the sci device resume is done, so
with scsi_dev->power.is_suspended still true. And ata_port_pm_resume() calls
ata_port_resume_async() which triggers EH (which will do reset + rescan)
asynchronously. So it looks like we have scsi device resume and libata EH for
rescan fighting each others for the scan mutex and device lock, leading to deadlock.
Trying to recreate this issue now to confirm and debug further. But I suspect
the solution to this may be best implemented in libata, not in scsi.
This looks definitely related to this thread:
https://lore.kernel.org/linux-scsi/7b553268-69d3-913a-f9de-28f8d45bdb1e@xxxxxxx/
Similaraly to your comment on that thread, having to look at
dev->power.is_suspended is not ideal I think. What we need is to have ata and
scsi pm resume be synchronized, but I am not yet 100% clear on the scsi layer side.