Re: [PATCH 1/1] thermal/drivers/imx_sc_thermal: return -EAGAIN when SCFW turn off resource

From: Ulf Hansson
Date: Wed Aug 16 2023 - 17:24:47 EST


On Wed, 16 Aug 2023 at 22:46, Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
>
> On 16/08/2023 19:07, Frank Li wrote:
> > On Wed, Aug 16, 2023 at 06:47:17PM +0200, Daniel Lezcano wrote:
> >> On 16/08/2023 18:28, Frank Li wrote:
> >>> On Wed, Aug 16, 2023 at 10:44:32AM +0200, Daniel Lezcano wrote:
> >>>>
> >>>> Hi Frank,
> >>>>
> >>>> sorry for the delay
> >>>>
> >>>> On 14/07/2023 19:19, Frank Li wrote:
> >>>>> On Thu, Jul 13, 2023 at 02:49:54PM +0200, Daniel Lezcano wrote:
> >>>>>> On 12/07/2023 23:05, Frank Li wrote:
> >>>>>>> Avoid endless print following message when SCFW turns off resource.
> >>>>>>> [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)
> >>>>>>>
> >>>>>>> Signed-off-by: Frank Li <Frank.Li@xxxxxxx>
> >>>>>>> ---
> >>>>>>> drivers/thermal/imx_sc_thermal.c | 4 +++-
> >>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
> >>>>>>> index 8d6b4ef23746..0533d58f199f 100644
> >>>>>>> --- a/drivers/thermal/imx_sc_thermal.c
> >>>>>>> +++ b/drivers/thermal/imx_sc_thermal.c
> >>>>>>> @@ -58,7 +58,9 @@ static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
> >>>>>>> hdr->size = 2;
> >>>>>>> ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
> >>>>>>> - if (ret)
> >>>>>>> + if (ret == -EPERM) /* NO POWER */
> >>>>>>> + return -EAGAIN;
> >>>>>>
> >>>>>> Isn't there a chain call somewhere when the resource is turned off, so the
> >>>>>> thermal zone can be disabled?
> >>>>>
> >>>>> A possible place in drivers/firmware/imx/scu-pd.c. but I am not sure how to
> >>>>> get thermal devices. I just found a API thermal_zone_get_zone_by_name(). I
> >>>>> am not sure if it is good to depend on "name", which add coupling between
> >>>>> two drivers and if there are external thermal devices(such as) has the
> >>>>> same name, it will wrong turn off.
> >>>>
> >>>> Correct
> >>>>
> >>>>> If add power domain notification in thermal driver, I am not how to get
> >>>>> other devices's pd in thermal driver.
> >>>>>
> >>>>> Any example I can refer?
> >>>>>
> >>>>> Or this is simple enough solution.
> >>>>
> >>>> The solution works for removing the error message but it does not solve the
> >>>> root cause of the issue. The thermal zone keeps monitoring while the sensor
> >>>> is down.
> >>>>
> >>>> So the question is why the sensor is shut down if it is in use?
> >>>
> >>> Do you know if there are any code I reference? I supposed it is quite common.
> >>
> >> Sorry, I don't get your comment
> >>
> >> What I meant is why is the sensor turned off if it is in use ?
> >
> > One typical example is cpu hotplug. The sensor is located CPU power domain.
> > If CPU hotplug off, CPU power domain will be turn off.
> >
> > It doesn't make sensor keep monitor such sensor when CPU already power off.
> > It doesn't make sensor to keep CPU power on just because want to get sensor
> > data.
> >
> > Anthor example is GPU, if there are GPU0 and GPU1. Most case just GPU0
> > work. GPU1 may turn off when less loading.
> >
> > Ideally, thermal can get notification from power domain driver.
> > when such power domain turn off, disable thermal zone.
> >
> > So far, I have not idea how to do that.
>
> Ulf,
>
> do you have a guidance to link the thermal zone and the power domain in
> order to get a poweron/off notification leading to enable/disable the
> thermal zone ?

I don't know the details here, so apologize for my ignorance to start
with. What platform is this?

A vague idea could be to hook up the thermal sensor to the
corresponding CPU power domain. Assuming the CPU power domain is
modelled as a genpd provider, then this allows the driver for the
thermal sensor to register for power-on/off notifications of the genpd
(see dev_pm_genpd_add_notifier()).

Can this work?

Kind regards
Uffe