Re: PM-runtime: supplier looses track of consumer during probe

From: Tushar Nimkar
Date: Fri Dec 02 2022 - 07:23:15 EST


Thanks Adrian and Rafael,
We are trying both patches separately. And will update result once we get.

Thanks,
Tushar Nimkar

On 12/2/2022 1:14 AM, Rafael J. Wysocki wrote:
On Thursday, December 1, 2022 8:28:25 PM CET Rafael J. Wysocki wrote:
On Thu, Dec 1, 2022 at 2:10 PM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:

On 29/11/22 18:56, Nitin Rawat wrote:
Hi Adrian,

On 11/21/2022 11:38 AM, Tushar Nimkar wrote:
Hi Adrian,

On 11/18/2022 8:25 PM, Adrian Hunter wrote:
On 4/11/22 11:19, Tushar Nimkar wrote:
Hi linux-pm/linux-scsi,

Process -1
ufshcd_async_scan context (process 1)
scsi_autopm_put_device() //0:0:0:0

I am having trouble following your description. What function is calling
scsi_autopm_put_device() here?

Below is flow which calls scsi_autopm_put_device()
Process -1
ufshcd_async_scan()
scsi_probe_and_add_lun()
scsi_add_lun()
slave_configure()
scsi_sysfs_add_sdev()
scsi_autopm_get_device()
device_add() <- invoked [Process 2] sd_probe()
scsi_autopm_put_device()

pm_runtime_put_sync()
__pm_runtime_idle()
rpm_idle() -- RPM_GET_PUT(4)
__rpm_callback
scsi_runtime_idle()
pm_runtime_mark_last_busy()
pm_runtime_autosuspend() --[A]
rpm_suspend() -- RPM_AUTO(8)
pm_runtime_autosuspend_expiration() use_autosuspend is false return 0 --- [B]
__update_runtime_status to RPM_SUSPENDING
__rpm_callback()
__rpm_put_suppliers(dev, false)
__update_runtime_status to RPM_SUSPENDED
rpm_suspend_suppliers()
rpm_idle() for supplier -- RPM_ASYNC(1) return (-EAGAIN) [ Other consumer active for supplier]
rpm_suspend() – END with return=0
scsi_runtime_idle() END return (-EBUSY) always.

Not following here either. Which device is EBUSY and why?

scsi_runtime_idle() return -EBUSY always [3]
Storage/scsi team can better explain -EBUSY implementation.

EBUSY is returned from below code for consumer dev 0:0:0:0.
scsi_runtime_idle is called from scsi_autopm_put_device which inturn is called from ufshcd_async_scan (Process 1 as per above call stack)
static int scsi_runtime_idle(struct device *dev)
{
:

if (scsi_is_sdev_device(dev)) {
pm_runtime_mark_last_busy(dev);
pm_runtime_autosuspend(dev);
return -EBUSY; ---> EBUSY returned from here.
}


}


[3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/scsi/scsi_pm.c?h=next-20221118#n210



[1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
[2]: https://lkml.org/lkml/2022/10/12/259

It looks to me like __rpm_callback() makes assumptions about
dev->power.runtime_status that are not necessarily true because
dev->power.lock is dropped.

Well, this happens because rpm_idle() calls __rpm_callback() and
allows it to run concurrently with rpm_suspend() and rpm_resume(), so
one of them may change runtime_status to RPM_SUSPENDING or
RPM_RESUMING while __rpm_callback() is running.

It is somewhat questionable whether or not this should be allowed to
happen, but since it is generally allowed to suspend the device from
its .runtime_idle callback, there is not too much that can be done
about it.

But this means that the patch below should help too.

I actually think that we can do both, because rpm_idle() doesn't have to do
the whole device links dance and the fact that it still calls __rpm_callback()
is a clear oversight.

---
drivers/base/power/runtime.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -484,7 +484,17 @@ static int rpm_idle(struct device *dev,
dev->power.idle_notification = true;
- retval = __rpm_callback(callback, dev);
+ if (dev->power.irq_safe)
+ spin_unlock(&dev->power.lock);
+ else
+ spin_unlock_irq(&dev->power.lock);
+
+ retval = callback(dev);
+
+ if (dev->power.irq_safe)
+ spin_lock(&dev->power.lock);
+ else
+ spin_lock_irq(&dev->power.lock);
dev->power.idle_notification = false;
wake_up_all(&dev->power.wait_queue);