Re: PM runtime_error handling missing in many drivers?

From: Oliver Neukum
Date: Tue Jun 21 2022 - 05:38:44 EST


On 20.06.22 16:42, Vincent Whitchurch wrote:

Hi,

> Many drivers do something like the following to resume their hardware
> before performing some hardware access when user space ask for it:
>
> ret = pm_runtime_resume_and_get(dev);
> if (ret)
> return ret;
>
> But if the ->runtime_resume() callback fails, then the
> power.runtime_error is set and any further attempts to use
> pm_runtime_resume_and_get() will fail, as documented in
> Documentation/power/runtime_pm.rst.

Whether this is properly documented is debatable.
4. Runtime PM Device Helper Functions (as a chapter)
does _not_ document it.


> [110778.050000][ T27] rpm_resume: 0-0009 flags-4 cnt-1 dep-0 auto-1 p-0 irq-0 child-0
> [110778.050000][ T27] rpm_return_int: rpm_resume+0x24d/0x11d0:0-0009 ret=-22
>
> The following patch fixes the issue on vcnl4000, but is this the right in the


> fix? And, unless I'm missing something, there are dozens of drivers
> with the same problem.

Yes. The point of pm_runtime_resume_and_get() is to remove the need
for handling errors when the resume fails. So I fail to see why a
permanent record of a failure makes sense for this API.

> diff --git a/drivers/iio/light/vcnl4000.c b/drivers/iio/light/vcnl4000.c
> index e02e92bc2928..082b8969fe2f 100644
> --- a/drivers/iio/light/vcnl4000.c
> +++ b/drivers/iio/light/vcnl4000.c
> @@ -414,6 +414,8 @@ static int vcnl4000_set_pm_runtime_state(struct vcnl4000_data *data, bool on)
>
> if (on) {
> ret = pm_runtime_resume_and_get(dev);
> + if (ret)
> + pm_runtime_set_suspended(dev);
> } else {
> pm_runtime_mark_last_busy(dev);
> ret = pm_runtime_put_autosuspend(dev);

If you need to add this to every driver, you can just as well add it to
pm_runtime_resume_and_get() to avoid the duplication.

But I am afraid we need to ask a deeper question. Is there a point
in recording failures to resume? The error code is reported back.
If a driver wishes to act upon it, it can. The core really only
uses the result to block new PM operations.
But nobody requests a resume unless it is necessary. Thus I fail
to see the point of checking this flag in resume as opposed to
suspend. If we fail, we fail, why not retry? It seems to me that the
record should be used only during runtime suspend.

And as an immediate band aid, some errors like ENOMEM should
never be recorded.

Regards
Oliver