Re: [PATCH v7] scsi: ufs: Quiesce all scsi devices before shutdown

From: Stanley Chu
Date: Thu Aug 13 2020 - 04:56:00 EST


Hi Bart, Can, Chaotian,

Very appreciate your comments and suggestions, please see update below,

On Tue, 2020-08-04 at 00:04 +0800, Bart Van Assche wrote:
> On 2020-08-03 03:04, Stanley Chu wrote:
> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > index 307622284239..7cb220b3fde0 100644
> > --- a/drivers/scsi/ufs/ufshcd.c
> > +++ b/drivers/scsi/ufs/ufshcd.c
> > @@ -8640,6 +8640,7 @@ EXPORT_SYMBOL(ufshcd_runtime_idle);
> > int ufshcd_shutdown(struct ufs_hba *hba)
> > {
> > int ret = 0;
> > + struct scsi_target *starget;
> >
> > if (!hba->is_powered)
> > goto out;
> > @@ -8647,11 +8648,27 @@ int ufshcd_shutdown(struct ufs_hba *hba)
> > if (ufshcd_is_ufs_dev_poweroff(hba) && ufshcd_is_link_off(hba))
> > goto out;
> >
> > - if (pm_runtime_suspended(hba->dev)) {
> > - ret = ufshcd_runtime_resume(hba);
> > - if (ret)
> > - goto out;
> > - }
> > + /*
> > + * Let runtime PM framework manage and prevent concurrent runtime
> > + * operations with shutdown flow.
> > + */
> > + pm_runtime_get_sync(hba->dev);
> > +
> > + /*
> > + * Quiesce all SCSI devices to prevent any non-PM requests sending
> > + * from block layer during and after shutdown.
> > + *
> > + * Here we can not use blk_cleanup_queue() since PM requests
> > + * (with BLK_MQ_REQ_PREEMPT flag) are still required to be sent
> > + * through block layer. Therefore SCSI command queued after the
> > + * scsi_target_quiesce() call returned will block until
> > + * blk_cleanup_queue() is called.
> > + *
> > + * Besides, scsi_target_"un"quiesce (e.g., scsi_target_resume) can
> > + * be ignored since shutdown is one-way flow.
> > + */
> > + list_for_each_entry(starget, &hba->host->__targets, siblings)
> > + scsi_target_quiesce(starget);
> >
> > ret = ufshcd_suspend(hba, UFS_SHUTDOWN_PM);
> > out:
>
> This seems wrong to me. Since ufshcd_shutdown() shuts down the link I think
> it should call scsi_remove_device() instead of scsi_target_quiesce().

I tried many ways to come out the final solution. Currently two options
are considered,

== Option 1 ==
pm_runtime_get_sync(hba->dev);

shost_for_each_device(sdev, hba->host) {
scsi_autopm_get_device(sdev);
if (sdev == hba->sdev_ufs_device)
scsi_device_quiesce(sdev);
else
scsi_remove_device(sdev);
}

ret = ufshcd_suspend(hba, UFS_SHUTDOWN_PM);

scsi_remove_device(hba->sdev_ufs_device);

Note. Using scsi_autopm_get_device() instead of pm_runtime_disable()
is to prevent noisy message by below checking,

WARN_ON_ONCE(sdev->quiesced_by && sdev->quiesced_by != current);

in
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/scsi/scsi_lib.c#n2515

This warning shows up if we try to quiesce a runtime-suspended SCSI
device. This is possible during our new shutdown flow. Using
scsi_autopm_get_device() to resume all SCSI devices first can prevent
it.

In addition, normally sd_shutdown() would be executed prior than
ufshcd_shutdown(). If scsi_remove_device() is invoked by
ufshcd_shutdown(), sd_shutdown() will be executed again for a SCSI disk
by

[ 131.398977] sd_shutdown+0x44/0x118
[ 131.399416] sd_remove+0x5c/0xc4
[ 131.399824] device_release_driver_internal+0x1c4/0x2e4
[ 131.400481] device_release_driver+0x18/0x24
[ 131.401018] bus_remove_device+0x108/0x134
[ 131.401533] device_del+0x2dc/0x630
[ 131.401973] __scsi_remove_device+0xc0/0x174
[ 131.402510] scsi_remove_device+0x30/0x48
[ 131.403014] ufshcd_shutdown+0xc8/0x138

In this case, we could see SYNCHRONIZE_CACHE command will be sent to the
same SCSI device twice. This is kind of wired during shutdown flow.

Moreover, in consideration of performance of ufshcd_shutdown(), Option 1
obviously degrades the latency a lot by scsi_remove_device(). Please see
the "Performance Measurement" data below.

Compared Option 2, this way is simpler and also effective. This way may
be a better compromise.

== Option 2 ==
pm_runtime_get_sync(hba->dev);

shost_for_each_device(sdev, hba->host) {
scsi_autopm_get_device(sdev);
scsi_device_quiesce(sdev);
}

== Performance Measurement ==
As-Is: < 5 ms
Option 1: 850 ms
Option 2: 60 ms

What would you prefer? Or would you have any further suggestions?

Thanks,

Stanley Chu