Re: [PATCH 2/3] thermal: qoriq_thermal: only enable supported sensors

From: Sebastian Krzyszkowiak
Date: Wed Jun 07 2023 - 13:43:02 EST


On środa, 7 czerwca 2023 10:28:59 CEST Daniel Lezcano wrote:
> On 07/06/2023 08:01, Sebastian Krzyszkowiak wrote:
> > On piątek, 2 czerwca 2023 15:11:37 CEST Daniel Lezcano wrote:
> >> On 01/06/2023 11:52, Peng Fan wrote:
> >>> Hi Daniel,
> >>>
> >>>> Subject: RE: [PATCH 2/3] thermal: qoriq_thermal: only enable supported
> >>>> sensors
> >>>>
> >>>>> Subject: Re: [PATCH 2/3] thermal: qoriq_thermal: only enable supported
> >>>>> sensors
> >>>>>
> >>>>> On 31/05/2023 14:05, Peng Fan wrote:
> >>>>>>> Subject: Re: [PATCH 2/3] thermal: qoriq_thermal: only enable
> >>>>>>> supported sensors
> >>>>>>>
> >>>>>>> On 16/05/2023 10:37, Peng Fan (OSS) wrote:
> >>>>>>>> From: Peng Fan <peng.fan@xxxxxxx>
> >>>>>>>>
> >>>>>>>> There are MAX 16 sensors, but not all of them supported. Such as
> >>>>>>>> i.MX8MQ, there are only 3 sensors. Enabling all 16 sensors will
> >>>>>>>> touch reserved bits from i.MX8MQ reference mannual, and TMU will
> >>>>>>>> stuck, temperature will not update anymore.
> >>>>>>>>
> >>>>>>>> Fixes: 45038e03d633 ("thermal: qoriq: Enable all sensors before
> >>>>>>>> registering them")
> >>>>>>>> Signed-off-by: Peng Fan <peng.fan@xxxxxxx>
> >>>>>>>> ---
> >>>>>>>>
> >>>>>>>> drivers/thermal/qoriq_thermal.c | 30
> >>>>>>>> +++++++++++++++++++----------
> >>>>
> >>>> -
> >>>>
> >>>>>>>> 1 file changed, 19 insertions(+), 11 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/thermal/qoriq_thermal.c
> >>>>>>>> b/drivers/thermal/qoriq_thermal.c index
> >>>>
> >>>> b806a0929459..53748c4a5be1
> >>>>
> >>>>>>>> 100644
> >>>>>>>> --- a/drivers/thermal/qoriq_thermal.c
> >>>>>>>> +++ b/drivers/thermal/qoriq_thermal.c
> >>>>>>>> @@ -31,7 +31,6 @@
> >>>>>>>>
> >>>>>>>> #define TMR_DISABLE 0x0
> >>>>>>>> #define TMR_ME 0x80000000
> >>>>>>>> #define TMR_ALPF 0x0c000000
> >>>>>>>>
> >>>>>>>> -#define TMR_MSITE_ALL GENMASK(15, 0)
> >>>>>>>>
> >>>>>>>> #define REGS_TMTMIR 0x008 /* Temperature measurement
> >>>>>>>
> >>>>>>> interval Register */
> >>>>>>>
> >>>>>>>> #define TMTMIR_DEFAULT 0x0000000f
> >>>>>>>>
> >>>>>>>> @@ -105,6 +104,11 @@ static int tmu_get_temp(struct
> >>>>>>>
> >>>>>>> thermal_zone_device *tz, int *temp)
> >>>>>>>
> >>>>>>>> * within sensor range. TEMP is an 9 bit value
representing
> >>>>>>>> * temperature in KelVin.
> >>>>>>>> */
> >>>>>>>>
> >>>>>>>> +
> >>>>>>>> + regmap_read(qdata->regmap, REGS_TMR, &val);
> >>>>>>>> + if (!(val & TMR_ME))
> >>>>>>>> + return -EAGAIN;
> >>>>>>>
> >>>>>>> How is this change related to what is described in the changelog?
> >>>>>>
> >>>>>> devm_thermal_zone_of_sensor_register will invoke get temp, since we
> >>>>>> reverted the 45038e03d633 did, we need to check TMR_ME to avoid
> >>>>>
> >>>>> return
> >>>>>
> >>>>>> invalid temperature.
> >>>>>>
> >>>>> From a higher perspective if the sensor won't be enabled, then the
> >>>>>
> >>>>> thermal zone should not be registered, the get_temp won't happen on a
> >>>>> disabled sensor and this test won't be necessary, no ?
> >>>
> >>> After thinking more, I'd prefer current logic.
> >>>
> >>> We rely on devm_thermal_of_zone_register's return value to know
> >>> whether there is a valid zone, then set sites bit, and after collected
> >>> all site bits, we enable the thermal IP.
> >>>
> >>> If move the enabling thermal IP before devm_thermal_of_zone_register,
> >>> We need check dtb thermal zone, to know which zone is valid for current
> >>> thermal IP. This would complicate the design.
> >>>
> >>> So just checking the enabling bit in get temperature would be much
> >>> simpler, and there just a small window before enabling thermal IP.
> >>
> >> If the thermal zone is not described, then the thermal zone won't be
> >> created as it fails with -ENODEV and thus get_temp won't be called on a
> >> disabled site, right?
> >
> > That's not what the problem is. It's about zones that *will* be created -
> > since the driver only knows that a thermal zone isn't described in the
> > device tree after it fails registering, it can't enable the site *before*
> > the zone gets registered - so it happens afterwards. That's why it needs
> > this check to not return a bogus initial value before the site gets
> > actually enabled later in qoriq_tmu_register_tmu_zone.
>
> Sorry, I get the point but I don't see how that can happen:
>
> qoriq_tmu_register_tmu_zone() calls devm_thermal_of_zone_register() for
> *all* sites regardless if they really exists or not.
>
> Under the hood, the function devm_thermal_of_zone_register() calls
> thermal_of_zone_register(). This one fails when calling
> of_thermal_zone_find() because it does not exist and returns -ENODEV.
>
> Hence, the thermal_zone_device_register_with_trips() is not called, the
> thermal zone is not created neither updated.

Again - that's not the case the check is there for. It's there for zones that
do exist and that do get registered, because REGS_TMR only gets set *after*
all the zones are already registered (the driver as it is right now does not
know which sites it should enable before registering the zones). Because of
that, the first value a zone gets after being registered is always bogus,
because no monitoring site has been enabled yet at all.

> So I don't understand why the test:
>
> + regmap_read(qdata->regmap, REGS_TMR, &val);
> + if (!(val & TMR_ME))
> + return -EAGAIN;
>
> is needed in the get_temp() ops as the thermal zone for this disabled
> site should not exist.
>
> I'm not putting in question the series, just wanting to avoid a
> potential pointless check in an ops.

It's definitely not pointless (it does workaround a real issue). Is it elegant?
IMO bringing thermal_zone_of_get_sensor_id back (or doing something
equivalent) instead would be much cleaner:)

Cheers,
Sebastian