Re: [PATCH V7] thermal/core/power_allocator: avoid thermal cdev can not be reset

From: Lukasz Luba
Date: Wed Jan 10 2024 - 08:03:43 EST




On 1/10/24 11:55, Di Shen wrote:
Commit 0952177f2a1f ("thermal/core/power_allocator: Update once
cooling devices when temp is low") adds an update flag to avoid
the thermal event is triggered when there is no need, and
thermal cdev would be updated once when temperature is low.

But when the trips are writable, and switch_on_temp is set
to be a higher value, the cooling device state may not be
reset to 0, because last_temperature is smaller than the
switch_on_temp.

For example:
First:
switch_on_temp=70 control_temp=85;
Then userspace change the trip_temp:
switch_on_temp=45 control_temp=55 cur_temp=54

Then userspace reset the trip_temp:
switch_on_temp=70 control_temp=85 cur_temp=57 last_temp=54

At this time, the cooling device state should be reset to 0.
However, because cur_temp(57) < switch_on_temp(70)
last_temp(54) < switch_on_temp(70) ----> update = false,
update is false, the cooling device state can not be reset.

Considering tz->passive can also be represented the temperature
status, this patch modifies the update flag with tz->passive.

When the first time the temperature drops below switch_on, the
states of cooling devices can be reset once, and the tz->passive
is updated to 0. In the next round, because tz->passive is 0,
the cdev->state would not be updated.

By using the tz->passive as the "update" flag, the issue above
can be solved, and the cooling devices can be update only once
when the temperature is low.

Fixes: 0952177f2a1f ("thermal/core/power_allocator: Update once cooling devices when temp is low")
Cc: <stable@xxxxxxxxxxxxxxx> # v5.13+
Suggested-by: Wei Wang <wvw@xxxxxxxxxx>
Signed-off-by: Di Shen <di.shen@xxxxxxxxxx>

---
V7:
- Some formatting changes.
- Add Suggested-by tag.

V6: [6]
Compared to the previous version:
- Not change the thermal core.
- Not add new variables and function.
- Use tz->passive as "update" flag to indicate whether the cooling
devices should be reset.

V5: [5]
- Simplify the reset ops, make it no return value and no specific
trip ID as argument.
- Extend the commit message.

V4: [4]
- Compared to V3, handle it in thermal core instead of in governor.
- Add an ops to the governor structure, and call it when a trip
point is changed.
- Define reset ops for power allocator.

V3: [3]
- Add fix tag.

V2: [2]
- Compared to v1, do not revert.
- Add a variable(last_switch_on_temp) in power_allocator_params
to record the last switch_on_temp value.
- Adds a function to renew the update flag and update the
last_switch_on_temp when thermal trips are writable.

V1: [1]
- Revert commit 0952177f2a1f.

[1] https://lore.kernel.org/all/20230309135515.1232-1-di.shen@xxxxxxxxxx/
[2] https://lore.kernel.org/all/20230315093008.17489-1-di.shen@xxxxxxxxxx/
[3] https://lore.kernel.org/all/20230320095620.7480-1-di.shen@xxxxxxxxxx/
[4] https://lore.kernel.org/all/20230619063534.12831-1-di.shen@xxxxxxxxxx/
[5] https://lore.kernel.org/all/20230710033234.28641-1-di.shen@xxxxxxxxxx/
[6] https://lore.kernel.org/all/20240109112736.32566-1-di.shen@xxxxxxxxxx/
---
---
drivers/thermal/gov_power_allocator.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index 7b6aa265ff6a..81e061f183ad 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -762,7 +762,7 @@ static int power_allocator_throttle(struct thermal_zone_device *tz,
trip = params->trip_switch_on;
if (trip && tz->temperature < trip->temperature) {
- update = tz->last_temperature >= trip->temperature;
+ update = tz->passive;
tz->passive = 0;
reset_pid_controller(params);
allow_maximum_power(tz, update);

Thanks for the patch, LGTM.

Reviewed-by: Lukasz Luba <lukasz.luba@xxxxxxx>