Re: [RFC PATCH] thermal/core: Fix trip point crossing events ordering

From: Rafael J. Wysocki
Date: Wed Mar 06 2024 - 07:03:14 EST


On Wed, Mar 6, 2024 at 9:54 AM Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
>
> Let's assume the following setup:
>
> - trip 0 = 65°C
> - trip 1 = 70°C
> - trip 2 = 75°C
>
> The current temperature is 35°C.
>
> The interrupt is setup to fire at 65°C. If the thermal capacity is
> saturated it is possible the temperature jumps to 72°c when reading
> the temperature after the interrupt fired when 65°C was crossed. That
> means we should have two events notified to userspace. The first one
> for trip 0 and the second one for trip 1.
>
> When the function thermal_zone_update() is called from the threaded
> interrupt, it will read the temperature and then call for_each_trip()
> which in turns call handle_trip_point().
>
> This function will check:
>
> if (tz->last_temperature < trip->temperature &&
> tz->temperature >= trip->temperature)
> thermal_notify_tz_trip_up()

For the mainline:

$ git grep handle_trip_point | cat
$

Do you mean handle_thermal_trip()?

But it doesn't do the above in the mainline. It does (comments omitted)

if (tz->last_temperature < trip->threshold) {
if (tz->temperature >= trip->temperature) {
thermal_notify_tz_trip_up(tz, trip);
thermal_debug_tz_trip_up(tz, trip);
trip->threshold = trip->temperature - trip->hysteresis;
} else {
trip->threshold = trip->temperature;
}
}

>
> So here, we will call this function with trip0 followed by trip1. That
> will result in an event for each trip point, reflecting the trip point
> being crossed the way up with a temperature raising. So far, so good.
>
> Usually the sensors have an interrupt when the temperature is crossed
> the way up but not the way down, so there an extra delay corresponding
> to the passive polling where the temperature could have dropped and
> crossed more than one trip point. This scenario is likely to happen
> more often when multiple trip points are specified. So considering the
> same setup after crossing the trip 2, we stop the workload responsible
> of the heat and the temperature drops suddenly to 62°C. In this case,
> the next polling will call thermal_zone_device_update(), then
> for_each_trip() and handle_trip_point().
>
> This function will check:
>
> if (tz->last_temperature >= trip->temperature &&
> tz->temperature < trip->temperature - trip->hysteresis)
> thermal_notify_tz_trip_down()

Again, assuming that you mean handle_thermal_trip(), the above is not
the current mainline code, which is (comments omitted)

if (tz->last_temperature >= trip->threshold) {
if (tz->temperature < trip->temperature - trip->hysteresis) {
thermal_notify_tz_trip_down(tz, trip);
thermal_debug_tz_trip_down(tz, trip);
trip->threshold = trip->temperature;
} else {
trip->threshold = trip->temperature - trip->hysteresis;
}
}

I guess this doesn't matter here?

> The loop for_each_trip() will call trip0, 1 and 2. That will result in
> generating the events for trip0, 1 and 2, in the wrong order. That is
> not reflecting the thermal dynamic and puzzles the userspace
> monitoring the temperature.

Only if the trips are ordered in a specific way, but they don't need
to be ordered in any way.

> Fix this by inspecting the trend of the temperature. If it is raising,
> then we browse the trip point in the ascending order, if it is falling
> then we browse in the descending order.
>
> Signed-off-by: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
> ---
> drivers/thermal/thermal_core.c | 8 ++++++--
> drivers/thermal/thermal_core.h | 3 +++
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index dfaa6341694a..abb8ee5c9afe 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -473,8 +473,12 @@ void __thermal_zone_device_update(struct thermal_zone_device *tz,
>
> tz->notify_event = event;
>
> - for_each_trip(tz, trip)
> - handle_thermal_trip(tz, trip);
> + if (tz->last_temperature < tz->temperature)
> + for_each_trip(tz, trip)
> + handle_thermal_trip(tz, trip);
> + else
> + for_each_trip_reverse(tz, trip)
> + handle_thermal_trip(tz, trip);

This works assuming a "proper" ordering of the trips.

>
> monitor_thermal_zone(tz);
> }
> diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
> index e9c099ecdd0f..0072b3d4039e 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -123,6 +123,9 @@ void thermal_governor_update_tz(struct thermal_zone_device *tz,
> #define for_each_trip(__tz, __trip) \
> for (__trip = __tz->trips; __trip - __tz->trips < __tz->num_trips; __trip++)
>
> +#define for_each_trip_reverse(__tz, __trip) \
> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--)
> +
> void __thermal_zone_set_trips(struct thermal_zone_device *tz);
> int thermal_zone_trip_id(const struct thermal_zone_device *tz,
> const struct thermal_trip *trip);
> --

Generally speaking, this is a matter of getting alignment on the
expectations between the kernel and user space.

It looks like user space expects to get the notifications in the order
of either growing or falling temperatures, depending on the direction
of the temperature change. Ordering the trips in the kernel is not
practical, but the notifications can be ordered in principle. Is this
what you'd like to do?

Or can user space be bothered with recognizing that it may get the
notifications for different trips out of order?