Re: [PATCH 1/1] thermal/drivers/imx_sc_thermal: return -EAGAIN when SCFW turn off resource

From: Frank Li
Date: Thu Aug 17 2023 - 11:31:56 EST


On Wed, Aug 16, 2023 at 11:23:17PM +0200, Ulf Hansson wrote:
> On Wed, 16 Aug 2023 at 22:46, Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
> >
> > On 16/08/2023 19:07, Frank Li wrote:
> > > On Wed, Aug 16, 2023 at 06:47:17PM +0200, Daniel Lezcano wrote:
> > >> On 16/08/2023 18:28, Frank Li wrote:
> > >>> On Wed, Aug 16, 2023 at 10:44:32AM +0200, Daniel Lezcano wrote:
> > >>>>
> > >>>> Hi Frank,
> > >>>>
> > >>>> sorry for the delay
> > >>>>
> > >>>> On 14/07/2023 19:19, Frank Li wrote:
> > >>>>> On Thu, Jul 13, 2023 at 02:49:54PM +0200, Daniel Lezcano wrote:
> > >>>>>> On 12/07/2023 23:05, Frank Li wrote:
> > >>>>>>> Avoid endless print following message when SCFW turns off resource.
> > >>>>>>> [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)
> > >>>>>>>
> > >>>>>>> Signed-off-by: Frank Li <Frank.Li@xxxxxxx>
> > >>>>>>> ---
> > >>>>>>> drivers/thermal/imx_sc_thermal.c | 4 +++-
> > >>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
> > >>>>>>>
> > >>>>>>> diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
> > >>>>>>> index 8d6b4ef23746..0533d58f199f 100644
> > >>>>>>> --- a/drivers/thermal/imx_sc_thermal.c
> > >>>>>>> +++ b/drivers/thermal/imx_sc_thermal.c
> > >>>>>>> @@ -58,7 +58,9 @@ static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
> > >>>>>>> hdr->size = 2;
> > >>>>>>> ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
> > >>>>>>> - if (ret)
> > >>>>>>> + if (ret == -EPERM) /* NO POWER */
> > >>>>>>> + return -EAGAIN;
> > >>>>>>
> > >>>>>> Isn't there a chain call somewhere when the resource is turned off, so the
> > >>>>>> thermal zone can be disabled?
> > >>>>>
> > >>>>> A possible place in drivers/firmware/imx/scu-pd.c. but I am not sure how to
> > >>>>> get thermal devices. I just found a API thermal_zone_get_zone_by_name(). I
> > >>>>> am not sure if it is good to depend on "name", which add coupling between
> > >>>>> two drivers and if there are external thermal devices(such as) has the
> > >>>>> same name, it will wrong turn off.
> > >>>>
> > >>>> Correct
> > >>>>
> > >>>>> If add power domain notification in thermal driver, I am not how to get
> > >>>>> other devices's pd in thermal driver.
> > >>>>>
> > >>>>> Any example I can refer?
> > >>>>>
> > >>>>> Or this is simple enough solution.
> > >>>>
> > >>>> The solution works for removing the error message but it does not solve the
> > >>>> root cause of the issue. The thermal zone keeps monitoring while the sensor
> > >>>> is down.
> > >>>>
> > >>>> So the question is why the sensor is shut down if it is in use?
> > >>>
> > >>> Do you know if there are any code I reference? I supposed it is quite common.
> > >>
> > >> Sorry, I don't get your comment
> > >>
> > >> What I meant is why is the sensor turned off if it is in use ?
> > >
> > > One typical example is cpu hotplug. The sensor is located CPU power domain.
> > > If CPU hotplug off, CPU power domain will be turn off.
> > >
> > > It doesn't make sensor keep monitor such sensor when CPU already power off.
> > > It doesn't make sensor to keep CPU power on just because want to get sensor
> > > data.
> > >
> > > Anthor example is GPU, if there are GPU0 and GPU1. Most case just GPU0
> > > work. GPU1 may turn off when less loading.
> > >
> > > Ideally, thermal can get notification from power domain driver.
> > > when such power domain turn off, disable thermal zone.
> > >
> > > So far, I have not idea how to do that.
> >
> > Ulf,
> >
> > do you have a guidance to link the thermal zone and the power domain in
> > order to get a poweron/off notification leading to enable/disable the
> > thermal zone ?
>
> I don't know the details here, so apologize for my ignorance to start
> with. What platform is this?

i.MX8QM.

>
> A vague idea could be to hook up the thermal sensor to the
> corresponding CPU power domain. Assuming the CPU power domain is
> modelled as a genpd provider, then this allows the driver for the
> thermal sensor to register for power-on/off notifications of the genpd
> (see dev_pm_genpd_add_notifier()).
>
> Can this work?

I don't think. dev_pm_genpd_ad_notifier() need a dev, which binded to pd.

tsens: thermal-sensor {
compatible = "fsl,imx-sc-thermal";
tsens-num = <6>;
#thermal-sensor-cells = <1>;
};

we have 6 thermal-sensor, which assocated with 6 pd,
IMX_SC_R_SYSTEM, IMX_SC_R_PMIC_0,
IMX_SC_R_AP_0, IMX_SC_R_AP_1,
IMX_SC_R_GPU_0_PID0, IMX_SC_R_GPU_1_PID0,
IMX_SC_R_DRC_0

We don't want to hold PD on just because want to get temperature. GPU pd
consume much power.

I want to register one callback at thermal-sensor driver, when GPU pd on,
enable thermal-zone. when GPU pd off, disable thermal zone.

we can do more common way.

gpu-thermal1 {
polling-delay-passive = <250>;
polling-delay = <2000>;
>>> pd=<&GPU1_PD>
thermal-sensors = <&tsens IMX_SC_R_GPU_1_PID0>;

};

if GPU1_PD on, then gpu-thermal1 enable,
if GPU1_PD off, then gpu-thermal1 disable.

>
> Kind regards
> Uffe