Re: [PATCH v4 01/30] thermal/core: Add a generic thermal_zone_get_trip() function

From: Daniel Lezcano
Date: Sat Sep 24 2022 - 13:46:42 EST


On 24/09/2022 00:19, Marek Szyprowski wrote:
Hi Daniel,

On 21.09.2022 11:42, Daniel Lezcano wrote:
The thermal_zone_device_ops structure defines a set of ops family,
get_trip_temp(), get_trip_hyst(), get_trip_type(). Each of them is
returning a property of a trip point.

The result is the code is calling the ops everywhere to get a trip
point which is supposed to be defined in the backend driver. It is a
non-sense as a thermal trip can be generic and used by the backend
driver to declare its trip points.

Part of the thermal framework has been changed and all the OF thermal
drivers are using the same definition for the trip point and use a
thermal zone registration variant to pass those trip points which are
part of the thermal zone device structure.

Consequently, we can use a generic function to get the trip points
when they are stored in the thermal zone device structure.

This approach can be generalized to all the drivers and we can get rid
of the ops->get_trip_*. That will result to a much more simpler code
and make possible to rework how the thermal trip are handled in the
thermal core framework as discussed previously.

This change adds a function thermal_zone_get_trip() where we get the
thermal trip point structure which contains all the properties (type,
temp, hyst) instead of doing multiple calls to ops->get_trip_*.

That opens the door for trip point extension with more attributes. For
instance, replacing the trip points disabled bitmask with a 'disabled'
field in the structure.

Here we replace all the calls to ops->get_trip_* in the thermal core
code with a call to the thermal_zone_get_trip() function.

While at it, add the thermal_zone_get_num_trips() to encapsulate the
code more and reduce the grip with the thermal framework internals.

Signed-off-by: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>

This patch landed in linux next-20220923 as commit 78ffa3e58d93
("thermal/core: Add a generic thermal_zone_get_trip() function").
Unfortunately it introduces a deadlock:

thermal_zone_device_update() calls handle_thermal_trip() under the
tz->lock, which in turn calls thermal_zone_get_trip(), which gathers
again tz->lock. I've tried to fix this by switching
handle_thermal_trip() to call __thermal_zone_get_trip().

This helps for fixing the issue in this change, but then I've tried to
apply it on top of linux next-20220923. Unfortunately it fails again. It
looks that the other changes also assumes that calling
thermal_zone_get_trip() is possible under the tz->lock, because in my
case it turned out that handle_non_critical_trips() called
step_wise_throttle(), which in turn called thermal_zone_get_trip(). I
gave up fixing this. Please re-check possible call paths and adjust
locking to them.

Yes, you are correct. Those paths have the lock held. I'm fixing this


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog