[PATCH 1/1] thermal: sysfs: avoid actual readings from sysfs

From: Eduardo Valentin
Date: Fri Jun 30 2023 - 21:58:29 EST


Hello,

On Fri, Jun 30, 2023 at 02:09:44PM +0200, Daniel Lezcano wrote:
>
>
>
> On 30/06/2023 12:46, Rafael J. Wysocki wrote:
> > Hi Daniel,
> >
> > On Fri, Jun 30, 2023 at 12:11 PM Daniel Lezcano
> > <daniel.lezcano@xxxxxxxxxx> wrote:
> > >
> > >
> > > Hi Rafael,
> > >
> > > On 30/06/2023 10:16, Rafael J. Wysocki wrote:
> > > > On Wed, Jun 28, 2023 at 11:10 PM Eduardo Valentin <evalenti@xxxxxxxxxx> wrote:
> > >
> > > [ ... ]
> > >
> > > > So what about adding a new zone attribute that can be used to specify
> > > > the preferred caching time for the temperature?
> > > >
> > > > That is, if the time interval between two consecutive updates of the
> > > > cached temperature value is less than the value of the new attribute,
> > > > the cached temperature value will be returned by "temp". Otherwise,
> > > > it will cause the sensor to be read and the value obtained from it
> > > > will be returned to user space and cached.
> > > >
> > > > If the value of the new attribute is 0, everything will work as it
> > > > does now (which will also need to be the default behavior).
> > >
> > > I'm still not convinced about the feature.
> > >
> > > Eduardo provided some numbers but they seem based on the characteristics
> > > of the I2C, not to a real use case. Eduardo?
> > >
> > > Before adding more complexity in the thermal framework and yet another
> > > sysfs entry, it would be interesting to have an experiment and show the
> > > impact of both configurations, not from a timing point of view but with
> > > a temperature mitigation accuracy.
> > >
> > > Without a real use case, this feature does make really sense IMO.
> >
> > I'm kind of unsure why you think that it is not a good idea in general
> > to have a way to limit the rate of accessing a temperature sensor, for
> > energy-efficiency reasons if nothing more.
>
> I don't think it is not a good idea. I've no judgement with the proposed
> change.
>
> But I'm not convinced it is really useful, that is why having a real use
> case and some numbers showing that feature solves the issue would be nice.
>
> It is illogical we want a fast and accurate response on a specific
> hardware and then design it with slow sensors and contention prone bus.

Totally agree, but at same time, this is real world :-)

>
> In Eduardo's example, we have 100ms monitoring rate on a I2C. This rate
> is usually to monitor CPUs with very fast transitions. With a remote
> site, the monitoring rate would be much slower, so if there is a
> contention in the bus because a dumb process is reading constantly the
> temperature, then it should be negligible.
>
> All that are hypothesis, that is why having a real use case would help
> to figure out the temperature limit drift at mitigation time.

Yeah, I guess the problem here is that you are assuming I2C is not a real
use case, not sure why. But it is and very common design in fact.

>
> Assuming it is really needed, I'm not sure that should be exported via
> sysfs. It is a driver issue and it may register the thermal zone with a
> parameter telling the userspace rate limit.
>
> On the other side, hwmon and thermal are connected. hwmon drivers
> register a thermal zone and thermal drivers add themselves in the hwmon
> sysfs directory. The temperature cache is handled in the driver level in
> the hwmon subsystems and we want to handle the temperature cache at the
> thermal sysfs level. How will we cope with this inconsistency?

Yeah, I do not see this, again, as where to handle cache type of design problem only.
This is really a protective / defensive code on the thermal core to avoid
userspace interfering on a kernel based control.


I agree that drivers may be free to go and defend themselves against
too frequent userspace requests, like they do, as you already shared
a link in another email. But saying that it is up to the driver to do this
is basically saying that the thermal subsystem do not care about their
own threads being delayed by a too frequent reads on a sysfs entry
created by the thermal subsystem, just because it is drivers responsability
to cache. To that is a missing defensive code.

>
> As a side note, slow drivers are usually going under drivers/hwmon.

Have you seen this code?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/hwmon/lm75.c#n517
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/hwmon/hwmon.c#n850


I also do not understand when you say slow drivers are usually going under
drivers/hwmon, does it really matter? One can design a thermal zone
that is connected to a hwmon device as input. Why would that be illogical?


>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>

--
All the best,
Eduardo Valentin