Re: [PATCH v1] PM-runtime: Check supplier_preactivated before release supplier

From: Peter Wang
Date: Thu Jun 30 2022 - 11:19:33 EST



On 6/30/22 10:47 PM, Rafael J. Wysocki wrote:
On Thu, Jun 30, 2022 at 4:26 PM Peter Wang <peter.wang@xxxxxxxxxxxx> wrote:

On 6/30/22 12:01 AM, Rafael J. Wysocki wrote:
[Add CCs to linix-pm, LKML and Greg]

On Wednesday, June 29, 2022 5:32:00 PM CEST Rafael J. Wysocki wrote:
On Wed, Jun 29, 2022 at 4:47 PM Peter Wang <peter.wang@xxxxxxxxxxxx> wrote:
On 6/29/22 9:22 PM, Rafael J. Wysocki wrote:
On Wed, Jun 29, 2022 at 5:02 AM Peter Wang <peter.wang@xxxxxxxxxxxx> wrote:
On 6/28/22 11:54 PM, Rafael J. Wysocki wrote:
On Tue, Jun 28, 2022 at 3:53 AM Peter Wang <peter.wang@xxxxxxxxxxxx> wrote:
On 6/28/22 3:00 AM, Rafael J. Wysocki wrote:
On Mon, Jun 13, 2022 at 2:08 PM <peter.wang@xxxxxxxxxxxx> wrote:
From: Peter Wang <peter.wang@xxxxxxxxxxxx>

With divice link of DL_FLAG_PM_RUNTIME, if consumer call pm_runtime_get_suppliers
to prevent supplier enter suspend, pm_runtime_release_supplier should
check supplier_preactivated before let supplier enter suspend.
Why?
because supplier_preactivated is true means supplier cannot enter
suspend, right?
No, it doesn't mean that.
Hi Rafael,

if supplier_preactivated is true, means someone call
pm_runtime_get_suppliers and
before pm_runtime_put_suppliers right? This section suppliers should not
enter suspend.
No, this is not how this is expected to work.

First off, the only caller of pm_runtime_get_suppliers() and
pm_runtime_put_suppliers() is __driver_probe_device(). Really nobody
else has any business that would require calling them.
Hi Rafael,

Yes, you are right!
__driver_probe_device the only one use and just because
__driver_probe_device use
pm_runtime_get_suppliers cause problem.


Second, the role of pm_runtime_get_suppliers() is to "preactivate" the
suppliers before running probe for a consumer device and the role of
the role of pm_runtime_get_suppliers() is to "preactivate" the suppliers,
but suppliers may suspend immediately after preactivate right?
Here is just this case. this is first racing point.
Thread A: pm_runtime_get_suppliers -> __driver_probe_device
Thread B: pm_runtime_release_supplier
Thread A: Run with supplier not preactivate -> __driver_probe_device

pm_runtime_put_suppliers() is to do the cleanup in case the device is
left in suspend after probing.

IOW, pm_runtime_get_suppliers() is to ensure that the suppliers will
be active until the probe callback takes over and the rest depends on
that callback.
The problem of this racing will finally let consumer is active but
supplier is suspended.
So it would be better to send a bug report regarding this.

The link relation is broken.
I know you may curious how it happened? right?
Honestly, I am not sure, but I think the second racing point
is rpm_get_suppliers and pm_runtime_put_suppliers(release rpm_active).
I'm not sure what you mean by "the racing point".

Yes, these functions can run concurrently.

So, I try to fix the first racing point and the problem is gone.
It is full meet expect, and the pm runtime will work smoothly after
__driver_probe_device done.
I'm almost sure that there is at least one scenario that would be
broken by this change.
That said, the code in there may be a bit overdesigned.

Does the patch below help?

---
drivers/base/power/runtime.c | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -1768,7 +1768,6 @@ void pm_runtime_get_suppliers(struct dev
if (link->flags & DL_FLAG_PM_RUNTIME) {
link->supplier_preactivated = true;
pm_runtime_get_sync(link->supplier);
- refcount_inc(&link->rpm_active);
}

device_links_read_unlock(idx);
@@ -1788,19 +1787,8 @@ void pm_runtime_put_suppliers(struct dev
list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
device_links_read_lock_held())
if (link->supplier_preactivated) {
- bool put;
-
link->supplier_preactivated = false;
-
- spin_lock_irq(&dev->power.lock);
-
- put = pm_runtime_status_suspended(dev) &&
- refcount_dec_not_one(&link->rpm_active);
-
- spin_unlock_irq(&dev->power.lock);
-
- if (put)
- pm_runtime_put(link->supplier);
+ pm_runtime_put(link->supplier);
}

device_links_read_unlock(idx);

Hi Rafael,

I think this patch solve the rpm_active racing problem.
But it still have problem that
pm_runtime_get_suppliers call pm_runtime_get_sync(link->supplier)
and supplier could suspend immediately by other thread who call
pm_runtime_release_supplier.
No, it won't, because pm_runtime_release_supplier() won't drop the
reference on the supplier taken by pm_runtime_get_suppliers(0 after
the patch.

Hi Rafael,

I think pm_runtime_release_supplier will always decrese the reference
rpm_active count to 1 and check idle will let supplier enter suspend. Am I wrong?
Could you explain why this patch won't drop the reference?

Thanks

Peter