Re: [PATCH 1/1] thermal: sysfs: avoid actual readings from sysfs

From: Eduardo Valentin
Date: Wed Jun 07 2023 - 12:29:48 EST


Rui!

Long time no chatting! In this case, no email exchange. Good to hear from you.

On Wed, Jun 07, 2023 at 06:32:46AM +0000, Zhang, Rui wrote:
>
>
>
> On Tue, 2023-06-06 at 17:37 -0700, Eduardo Valentin wrote:
> > From: Eduardo Valentin <eduval@xxxxxxxxxx>
> >
> > As the thermal zone caches the current and last temperature
> > value, the sysfs interface can use that instead of
> > forcing an actual update or read from the device.
> > This way, if multiple userspace requests are coming
> > in, we avoid storming the device with multiple reads
> > and potentially clogging the timing requirement
> > for the governors.
> >
> > Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx> (supporter:THERMAL)
> > Cc: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> (supporter:THERMAL)
> > Cc: Amit Kucheria <amitk@xxxxxxxxxx> (reviewer:THERMAL)
> > Cc: Zhang Rui <rui.zhang@xxxxxxxxx> (reviewer:THERMAL)
> > Cc: linux-pm@xxxxxxxxxxxxxxx (open list:THERMAL)
> > Cc: linux-kernel@xxxxxxxxxxxxxxx (open list)
> >
> > Signed-off-by: Eduardo Valentin <eduval@xxxxxxxxxx>
> > ---
> > drivers/thermal/thermal_sysfs.c | 21 ++++++++++++++++-----
> > 1 file changed, 16 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/thermal/thermal_sysfs.c
> > b/drivers/thermal/thermal_sysfs.c
> > index b6daea2398da..a240c58d9e08 100644
> > --- a/drivers/thermal/thermal_sysfs.c
> > +++ b/drivers/thermal/thermal_sysfs.c
> > @@ -35,12 +35,23 @@ static ssize_t
> > temp_show(struct device *dev, struct device_attribute *attr, char
> > *buf)
> > {
> > struct thermal_zone_device *tz = to_thermal_zone(dev);
> > - int temperature, ret;
> > -
> > - ret = thermal_zone_get_temp(tz, &temperature);
> > + int temperature;
> >
> > - if (ret)
> > - return ret;
> > + /*
> > + * don't force new update from external reads
> > + * This way we avoid messing up with time constraints.
> > + */
> > + if (tz->mode == THERMAL_DEVICE_DISABLED) {
> > + int r;
> > +
> > + r = thermal_zone_get_temp(tz, &temperature); /* holds
> > tz->lock*/
>
> what is the expected behavior of a disabled zone?
>
> IMO, the hardware may not be functional at this point, and reading the
> temperature should be avoided, as we do in
> __thermal_zone_device_update().
>
> should we just return failure in this case?
>
> userspace should poke the temp attribute for enabled zones only.

While I see your point, My understanding is that thermal zone mode
is either kernel mode or userspace mode, which to my interpretation,
it dictating where the control is, not that there is a malfunction,
necessarily.

Taking that perspective, the expected behavior here is to have a
in userspace control/governor, where it:
1. disables the in kernel control
2. monitors the thermal zone by reading the /temp property
3. Actuates on the assigned cooling devices for the thermal zone.

The above setup works pretty well for non critical control, where
the system design or state does not require an in kernel control.
And for that scenario, the proposed cached value will not be updated
given that the in kernel thread is not collecting/updating temperature
values anymore, therefore, the sysfs entry has to talk to the
driver to get the most current value.

For the failure case you referred to, Rui, This patch will handle it
too. It will talk to the driver, if the device is malfunction, the
driver will return an error which will be reported back
to userspace, as an error code upon read, which is expected behavior
for userspace to know that there is a problem.

>
> thanks,
> rui
> > + if (r)
> > + return r;
> > + } else {
> > + mutex_lock(&tz->lock);
> > + temperature = tz->temperature;
> > + mutex_unlock(&tz->lock);
> > + }
> >
> > return sprintf(buf, "%d\n", temperature);
> > }
>

--
All the best,
Eduardo Valentin