Re: [PATCH V4] thermal/core/power_allocator: reset thermal governor when trip point is changed

From: Lukasz Luba
Date: Tue Jun 20 2023 - 07:56:42 EST




On 6/20/23 11:39, Rafael J. Wysocki wrote:
On Tue, Jun 20, 2023 at 12:19 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:

Hi Rafael,


On 6/20/23 11:07, Rafael J. Wysocki wrote:
On Tue, Jun 20, 2023 at 11:46 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:

On Mon, Jun 19, 2023 at 8:36 AM Di Shen <di.shen@xxxxxxxxxx> wrote:

When the thermal trip point is changed, the governor should
be reset so that the policy algorithm be updated to adapt to the
new trip point.

This patch adds an ops for thermal the governor structure to reset
the governor. The ops is called when the trip point is changed.
For power allocator, the parameters of pid controller and the states
of power cooling devices can be reset when the passive trip point
is changed.

Signed-off-by: Di Shen <di.shen@xxxxxxxxxx>

---
V4:
- Compared to V3, handle it in thermal core instead of in governor.

- Add an ops to the governor structure, and call it when a trip
point is changed.

- Define reset ops for power allocator.

V3:
- Add fix tag.

V2:
- Compared to v1, do not revert.

- Add a variable(last_switch_on_temp) in power_allocator_params
to record the last switch_on_temp value.

- Adds a function to renew the update flag and update the
last_switch_on_temp when thermal trips are writable.

V1:
- Revert commit 0952177f2a1f.
---
---
drivers/thermal/gov_power_allocator.c | 21 +++++++++++++++++++++
drivers/thermal/thermal_trip.c | 6 ++++++
include/linux/thermal.h | 1 +
3 files changed, 28 insertions(+)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index 8642f1096b91..41d155adc616 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -729,10 +729,31 @@ static int power_allocator_throttle(struct thermal_zone_device *tz, int trip_id)
return allocate_power(tz, trip.temperature);
}

+static int power_allocator_reset(struct thermal_zone_device *tz, int trip_id)
+{
+ int ret = 0;
+ struct thermal_trip trip;
+ struct power_allocator_params *params = tz->governor_data;
+
+ ret = __thermal_zone_get_trip(tz, trip_id, &trip);
+ if (ret)
+ return ret;
+
+ /* Only need reset for passive trips */
+ if (trip.type != THERMAL_TRIP_PASSIVE)
+ return -EINVAL;
+
+ reset_pid_controller(params);
+ allow_maximum_power(tz, true);
+
+ return ret;
+}
+
static struct thermal_governor thermal_gov_power_allocator = {
.name = "power_allocator",
.bind_to_tz = power_allocator_bind,
.unbind_from_tz = power_allocator_unbind,
.throttle = power_allocator_throttle,
+ .reset = power_allocator_reset,
};
THERMAL_GOVERNOR_DECLARE(thermal_gov_power_allocator);
diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c
index 907f3a4d7bc8..52eb768fada8 100644
--- a/drivers/thermal/thermal_trip.c
+++ b/drivers/thermal/thermal_trip.c
@@ -173,6 +173,12 @@ int thermal_zone_set_trip(struct thermal_zone_device *tz, int trip_id,
if (tz->trips && (t.temperature != trip->temperature || t.hysteresis != trip->hysteresis))
tz->trips[trip_id] = *trip;

+ if (t.temperature != trip->temperature && tz->governor && tz->governor->reset) {
+ ret = tz->governor->reset(tz, trip_id);
+ if (ret)
+ pr_warn_once("Failed to reset thermal governor\n");

I'm not really sure if it is useful to print this message here.

First off, the governors may print more precise diagnostic messages if
they care.

Second, what is the sysadmin supposed to do in response to this message?

In addition to the above, trip point temperatures may be updated in
other places too, for instance in response to notifications from
platform firmware and IMV this new callback should be also used in
those cases. However, in those cases multiple trip points may change
at a time and the critical/hot trip point temperatures may be updated
too AFAICS.

IIRC the critical/hot trip points are handled differently, not using the
governors. The governors' 'throttle' callback would be called only
after we pass the test of 'critical/hot' [1].

OK, but is it actually useful to return an error code from the
->reset() callback when passed a non-passive trip point?

It will depend on the governor code. In our case the setup code
w.r.t. trip types is quite confusing (to fit into many possible
configurations). The non-passive trip point would be only
possible to bind when there are not other passive trip points.
That's is a really corner case and probably never used on any
device. Therefore, IMO we can just bail out in such situation
when then someone tries to update such single non-passive
trip point (probably not aware what is doing with IPA?).

For the rest of the governors - it's up to them what they
report in case non-passive trip is updated...


What Di is facing is in the issue under the bucket of
'handle_non_critical_trips()' when the governor just tries to
work on stale data - old trip temp.

Well, fair enough, but what about the other governors? Is this
problem limited to power_allocator?

IIUC the core fwk code - non of the governors would be needed
to handle the critical/hot trips. For the rest of the trip types
I would say it's up to the governor. In our IPA case this stale
data is used for power budget estimation - quite fundamental
step. Therefore, the reset and start from scratch would make more
sense.

I think other governors don't try to 'estimate' such
abstract power headroom based on temperature - so probably
they don't have to even implement the 'reset()' callback
(I don't know their logic that well).


For the 2nd case IIUC the code, we pass the 'trip.temperature'
and should be ready for what you said (modification of that value).

Generally speaking, it needs to be prepared for a simultaneous change
of multiple trip points (including active), in which case it may not
be useful to invoke the ->reset() callback for each of them
individually.

Although, that looks more cleaner IMO. Resetting one by one in
a temperature order would be easily maintainable, won't be?


Moreover, Daniel wants trip points to be sorted by temperature
eventually and in that case indices may change overall even on one
trip point update.

It's more complicated I think that this $subject, but very useful.



Furthermore, the critical/hot situation is handled w/o governor
assistance, so we should be safe:
tz->ops->critical(tz) or tz->ops->hot(tz) and not
tz->governor->throttle(tz, trip)

That's fine.