Re: [RESEND PATCH v3 2/2] thermal: core: introduce thermal zone device mode control

From: Matthias Kaehlcke
Date: Tue Jul 03 2018 - 13:13:39 EST


On Thu, Jun 28, 2018 at 05:33:02PM -0700, Matthias Kaehlcke wrote:
> Hi,
>
> I stumbled across this patch since I'm currently poking around with
> early thermal bringup on a platform and this patch has been integrated
> in our development tree.
>
> I'm seeing some unexpected behaviors, which could entirely due to
> wrong expectation from my side. I only have some basic working
> knowledge of the thermal framework, just want to double check and
> perhaps learn a thing or two.
>
> On Mon, Feb 26, 2018 at 03:41:18PM +0100, Enric Balletbo i Serra wrote:
> > From: Zhang Rui <rui.zhang@xxxxxxxxx>
> >
> > Thermal "mode" sysfs attribute is introduced to enable/disable a thermal
> > zone, and .get_mode()/.set_mode() callback is introduced for platform
> > thermal driver to enable/disable the hardware thermal control logic. And
> > thermal core takes no action upon thermal zone enable/disable.
> >
> > Actually, this is not quite right because thermal core still pokes those
> > disabled thermal zones occasionally, e.g. upon system resume.
> >
> > To fix this, a new flag 'mode' is introduced in struct thermal_zone_device
> > to represent the thermal zone mode, and several decisions have been made
> > based on this flag, including
> > 1. check the thermal zone mode right after it's registered.
> > 2. skip updating thermal zone if the zone is disabled
> > 3. stop the polling timer when the thermal zone is disabled
> >
> > Note: basically, this patch doesn't affect the existing always-enabled
> > thermal zones much, with just one exception -
> > thermal zone .get_mode() must be well prepared to reflect the real thermal
> > zone status upon the thermal zone registration.
>
> From my perspective this looks like a pretty significant change. For
> the platform I'm working on I added a thermal zone to the device tree,
> with the expectation that it would be enabled. Judging from the code
> without this patch this expectation seems to be naive, since
> of-thermal.c sets tz->mode to THERMAL_DEVICE_DISABLED, so apparently
> either userspace or some driver should call _set_mode(tz,
> THERMAL_DEVICE_ENABLED). However even without this the thermal zone
> appears to be active (I didn't really test end-to-end yet, but at
> least thermal_zone_device_update() is called and calls
> handle_thermal_trip()). Not sure why this is the case with
> THERMAL_DEVICE_DISABLED, but before I learned about the existence of
> the flag my expectation was that the zone would be enabled.
>
> With this patch thermal_zone_device_update() is skipped if the zone
> hasn't been explictly enabled, which may be consistent with the state
> of 'tz->mode', but effectively changes the previous/current behavior.
>
> Not sure if I'm just dumbly overlooking something obvious or if there
> is an actual problem with of_thermal (and maybe others).

The problem is that there are now two 'mode' fields, tzd->mode and the
other typically tzd->devdata->mode, and tzd->mode is never set to enabled.

> thermal zone .get_mode() must be well prepared to reflect the real thermal
> zone status upon the thermal zone registration.

For of_thermal tzd->mode is initialized with the result of .get_mode()
when the zone is registered. At this time no sensor has been added
to the zone, hence the zone is disabled. When a sensor is added later
tzd->devdata->mode is set to enabled, however tzd->mode remains disabled:

tzd->mode = DISABLED
tzd->devdata->mode = DISABLED

of_parse_thermal_zones
thermal_zone_device_register
tzd->mode = tzd->get_mode() // => DISABLED

<sensor>_probe
thermal_zone_of_sensor_register
tzd->set_mode(THERMAL_DEVICE_ENABLED)
tzd->devdata->mode = ENABLED

One way to fix this for of_thermal could be to setting tzd->mode in
.set_mode() in addition to setting tzd->devdata->mode. However this
feels like a workaround/hack. Personally I find it confusing to have
two mode fields for a thermal zone device. Maybe tzd->mode should
replace tzd->devdata->mode?