Re: [PATCH 1/1] thermal: sysfs: avoid actual readings from sysfs

From: Daniel Lezcano
Date: Sat Jul 01 2023 - 03:28:40 EST



Eduardo,

On 01/07/2023 03:49, Eduardo Valentin wrote:

[ ... ]

All that are hypothesis, that is why having a real use case would help
to figure out the temperature limit drift at mitigation time.

Yeah, I guess the problem here is that you are assuming I2C is not a real
use case, not sure why. But it is and very common design in fact.

If it is so common you should be able to reproduce the issue and give numbers. At this point, what I read is "that may happen because I2C is slow and we may monitor it at an insane rate, so let's cache the value".

Assuming it is really needed, I'm not sure that should be exported via
sysfs. It is a driver issue and it may register the thermal zone with a
parameter telling the userspace rate limit.

On the other side, hwmon and thermal are connected. hwmon drivers
register a thermal zone and thermal drivers add themselves in the hwmon
sysfs directory. The temperature cache is handled in the driver level in
the hwmon subsystems and we want to handle the temperature cache at the
thermal sysfs level. How will we cope with this inconsistency?

Yeah, I do not see this, again, as where to handle cache type of design problem only.
This is really a protective / defensive code on the thermal core to avoid
userspace interfering on a kernel based control.


I agree that drivers may be free to go and defend themselves against
too frequent userspace requests, like they do, as you already shared
a link in another email. But saying that it is up to the driver to do this
is basically saying that the thermal subsystem do not care about their
own threads being delayed by a too frequent reads on a sysfs entry
created by the thermal subsystem, just because it is drivers responsability
to cache. To that is a missing defensive code.

No, the core code has not to be defensive against bad hardware design.

If multiple processes are reading in an infinite loop the temperature, they will constantly take the lock, and as the monitoring thread is a CFS task, this one will be considered as the readers and be delayed, with probably a mitigation temperature drift. Here we have a missing defensive / optimized code against a DoS but it is unrelated to the hardware and the fix is not caching the value.

As a side note, slow drivers are usually going under drivers/hwmon.

Have you seen this code?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/hwmon/lm75.c#n517
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/hwmon/hwmon.c#n850

Yes, and ?

That is what I said, the hwmon and the thermal zone are connected.

I also do not understand when you say slow drivers are usually going under
drivers/hwmon, does it really matter? One can design a thermal zone
that is connected to a hwmon device as input. Why would that be illogical?

I'm not saying it is illogical. I'm pointing the slow drivers are under hwmon subsystems and usually don't aim in-kernel mitigation. The get_temp ops is going through hwmon and the drivers may cache the values. So *if* there is an in-kernel mitigation, the value will be already cached usually.

I do believe I raised some valid concerns regarding the approach. Could please take them into account instead of eluding them?

1. A real use case with temperature limit drift (easy to reproduce because it is very common)

2. How about the consistency between hwmon and thermal? (one driver but two ways to access the temperature - one may cache and the other not)

Another question regarding the I2C example, if another subsystem is using the I2C, won't it take the bus lock and create the contention also? I mean it is possible to create a mitigation drift by reading constantly another sensor value (eg. voltage or whatever) ?


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog