Re: [PATCH 1/1] thermal/drivers/imx_sc_thermal: return -EAGAIN when SCFW turn off resource

From: Ulf Hansson
Date: Thu Aug 17 2023 - 17:42:18 EST


On Thu, 17 Aug 2023 at 17:31, Frank Li <Frank.li@xxxxxxx> wrote:
>
> On Wed, Aug 16, 2023 at 11:23:17PM +0200, Ulf Hansson wrote:
> > On Wed, 16 Aug 2023 at 22:46, Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
> > >
> > > On 16/08/2023 19:07, Frank Li wrote:
> > > > On Wed, Aug 16, 2023 at 06:47:17PM +0200, Daniel Lezcano wrote:
> > > >> On 16/08/2023 18:28, Frank Li wrote:
> > > >>> On Wed, Aug 16, 2023 at 10:44:32AM +0200, Daniel Lezcano wrote:
> > > >>>>
> > > >>>> Hi Frank,
> > > >>>>
> > > >>>> sorry for the delay
> > > >>>>
> > > >>>> On 14/07/2023 19:19, Frank Li wrote:
> > > >>>>> On Thu, Jul 13, 2023 at 02:49:54PM +0200, Daniel Lezcano wrote:
> > > >>>>>> On 12/07/2023 23:05, Frank Li wrote:
> > > >>>>>>> Avoid endless print following message when SCFW turns off resource.
> > > >>>>>>> [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)
> > > >>>>>>>
> > > >>>>>>> Signed-off-by: Frank Li <Frank.Li@xxxxxxx>
> > > >>>>>>> ---
> > > >>>>>>> drivers/thermal/imx_sc_thermal.c | 4 +++-
> > > >>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >>>>>>>
> > > >>>>>>> diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
> > > >>>>>>> index 8d6b4ef23746..0533d58f199f 100644
> > > >>>>>>> --- a/drivers/thermal/imx_sc_thermal.c
> > > >>>>>>> +++ b/drivers/thermal/imx_sc_thermal.c
> > > >>>>>>> @@ -58,7 +58,9 @@ static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
> > > >>>>>>> hdr->size = 2;
> > > >>>>>>> ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
> > > >>>>>>> - if (ret)
> > > >>>>>>> + if (ret == -EPERM) /* NO POWER */
> > > >>>>>>> + return -EAGAIN;
> > > >>>>>>
> > > >>>>>> Isn't there a chain call somewhere when the resource is turned off, so the
> > > >>>>>> thermal zone can be disabled?
> > > >>>>>
> > > >>>>> A possible place in drivers/firmware/imx/scu-pd.c. but I am not sure how to
> > > >>>>> get thermal devices. I just found a API thermal_zone_get_zone_by_name(). I
> > > >>>>> am not sure if it is good to depend on "name", which add coupling between
> > > >>>>> two drivers and if there are external thermal devices(such as) has the
> > > >>>>> same name, it will wrong turn off.
> > > >>>>
> > > >>>> Correct
> > > >>>>
> > > >>>>> If add power domain notification in thermal driver, I am not how to get
> > > >>>>> other devices's pd in thermal driver.
> > > >>>>>
> > > >>>>> Any example I can refer?
> > > >>>>>
> > > >>>>> Or this is simple enough solution.
> > > >>>>
> > > >>>> The solution works for removing the error message but it does not solve the
> > > >>>> root cause of the issue. The thermal zone keeps monitoring while the sensor
> > > >>>> is down.
> > > >>>>
> > > >>>> So the question is why the sensor is shut down if it is in use?
> > > >>>
> > > >>> Do you know if there are any code I reference? I supposed it is quite common.
> > > >>
> > > >> Sorry, I don't get your comment
> > > >>
> > > >> What I meant is why is the sensor turned off if it is in use ?
> > > >
> > > > One typical example is cpu hotplug. The sensor is located CPU power domain.
> > > > If CPU hotplug off, CPU power domain will be turn off.
> > > >
> > > > It doesn't make sensor keep monitor such sensor when CPU already power off.
> > > > It doesn't make sensor to keep CPU power on just because want to get sensor
> > > > data.
> > > >
> > > > Anthor example is GPU, if there are GPU0 and GPU1. Most case just GPU0
> > > > work. GPU1 may turn off when less loading.
> > > >
> > > > Ideally, thermal can get notification from power domain driver.
> > > > when such power domain turn off, disable thermal zone.
> > > >
> > > > So far, I have not idea how to do that.
> > >
> > > Ulf,
> > >
> > > do you have a guidance to link the thermal zone and the power domain in
> > > order to get a poweron/off notification leading to enable/disable the
> > > thermal zone ?
> >
> > I don't know the details here, so apologize for my ignorance to start
> > with. What platform is this?
>
> i.MX8QM.

Thanks!

>
> >
> > A vague idea could be to hook up the thermal sensor to the
> > corresponding CPU power domain. Assuming the CPU power domain is
> > modelled as a genpd provider, then this allows the driver for the
> > thermal sensor to register for power-on/off notifications of the genpd
> > (see dev_pm_genpd_add_notifier()).
> >
> > Can this work?
>
> I don't think. dev_pm_genpd_ad_notifier() need a dev, which binded to pd.

Yes, correct.

>
> tsens: thermal-sensor {
> compatible = "fsl,imx-sc-thermal";
> tsens-num = <6>;
> #thermal-sensor-cells = <1>;
> };

Are you saying that the above doesn't have a corresponding struct
device created for it? That sounds like a problem that can be fixed,
right? Not sure if it makes sense though.

>
> we have 6 thermal-sensor, which assocated with 6 pd,
> IMX_SC_R_SYSTEM, IMX_SC_R_PMIC_0,
> IMX_SC_R_AP_0, IMX_SC_R_AP_1,
> IMX_SC_R_GPU_0_PID0, IMX_SC_R_GPU_1_PID0,
> IMX_SC_R_DRC_0
>
> We don't want to hold PD on just because want to get temperature. GPU pd
> consume much power.

Of course, that would be a bad idea it seems like.

The corresponding struct device that's hooked up to a genpd, can
remain runtime suspended as long as you think it makes sense. Thus it
would not keep the PM domain powered on when it isn't needed.

>
> I want to register one callback at thermal-sensor driver, when GPU pd on,
> enable thermal-zone. when GPU pd off, disable thermal zone.

Right, that should work fine too, I think. It seems like this is just
a matter of modelling this correctly in DT, I have no strong opinion
in this regard.

>
> we can do more common way.
>
> gpu-thermal1 {
> polling-delay-passive = <250>;
> polling-delay = <2000>;
> >>> pd=<&GPU1_PD>
> thermal-sensors = <&tsens IMX_SC_R_GPU_1_PID0>;
>
> };
>
> if GPU1_PD on, then gpu-thermal1 enable,
> if GPU1_PD off, then gpu-thermal1 disable.
>

Sounds like it's worth a try! Please keep me posted.

Kind regards
Uffe