Re: [RFC PATCH] thermal/core: Fix trip point crossing events ordering

From: Rafael J. Wysocki
Date: Wed Mar 06 2024 - 14:32:46 EST


On Wednesday, March 6, 2024 4:55:51 PM CET Daniel Lezcano wrote:
> On 06/03/2024 16:41, Rafael J. Wysocki wrote:
> > On Wed, Mar 6, 2024 at 2:16 PM Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
> >>
> >> On 06/03/2024 13:53, Rafael J. Wysocki wrote:
> >>> On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
> >>>>
> >>>> On 06/03/2024 13:02, Rafael J. Wysocki wrote:
> >>>>
> >>>> [ ... ]
> >>>>
> >>>>>> +#define for_each_trip_reverse(__tz, __trip) \
> >>>>>> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--)
> >>>>>> +
> >>>>>> void __thermal_zone_set_trips(struct thermal_zone_device *tz);
> >>>>>> int thermal_zone_trip_id(const struct thermal_zone_device *tz,
> >>>>>> const struct thermal_trip *trip);
> >>>>>> --
> >>>>>
> >>>>> Generally speaking, this is a matter of getting alignment on the
> >>>>> expectations between the kernel and user space.
> >>>>>
> >>>>> It looks like user space expects to get the notifications in the order
> >>>>> of either growing or falling temperatures, depending on the direction
> >>>>> of the temperature change. Ordering the trips in the kernel is not
> >>>>> practical, but the notifications can be ordered in principle. Is this
> >>>>> what you'd like to do?
> >>>>
> >>>> Yes
> >>>>
> >>>>> Or can user space be bothered with recognizing that it may get the
> >>>>> notifications for different trips out of order?
> >>>>
> >>>> IMO it is a bad information if the trip points events are coming
> >>>> unordered. The temperature signal is a time related measurements, the
> >>>> userspace should receive thermal information from this signal in the
> >>>> right order. It sounds strange to track the temperature signal in the
> >>>> kernel, then scramble the information, pass it to the userspace and
> >>>> except it to apply some kind of logic to unscramble it.
> >>>
> >>> So the notifications can be ordered before sending them out, as long
> >>> as they are produced by a single __thermal_zone_device_update() call.
> >>>
> >>> I guess you also would like the thermal_debug_tz_trip_up/down() calls
> >>> to be ordered, wouldn't you?
> >>
> >> Right
> >
> > I have an idea how to do this, but it is based on a couple of patches
> > that I've been working on in the meantime.
> >
> > Let me post these patches first and then I'll send a prototype patch
> > addressing this on top of them.
>
> That is awesome, thanks !

Anytime!

Now that I've posted this series:

https://lore.kernel.org/linux-pm/4558384.LvFx2qVVIh@kreacher/

I can append the patch below that is based on it.

The idea is really straightforward: Instead of sending the notifications
and recording the stats right away, create two lists of trips for which
they need to be send, sort them and then send the notifications etc in
the right order. I want to avoid explicit memory allocations that can
fail in principle, which is why lists are used.

The reason why two lists are used is in case the trips are updated and
that's why they appear to be crossed (which may not depend on the actual
temperature change).

One caveat is that the lists are sorted by trip thresholds (because they
are the real values take into account in the code), but user space may
expect them to be sorted by trip temperatures instead. That can be changed.

---
drivers/thermal/thermal_core.c | 39 +++++++++++++++++++++++++++++++++------
drivers/thermal/thermal_core.h | 1 +
2 files changed, 34 insertions(+), 6 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -15,6 +15,7 @@
#include <linux/slab.h>
#include <linux/kdev_t.h>
#include <linux/idr.h>
+#include <linux/list_sort.h>
#include <linux/thermal.h>
#include <linux/reboot.h>
#include <linux/string.h>
@@ -361,7 +362,9 @@ static void handle_critical_trips(struct
}

static void handle_thermal_trip(struct thermal_zone_device *tz,
- struct thermal_trip_desc *td)
+ struct thermal_trip_desc *td,
+ struct list_head *way_up_list,
+ struct list_head *way_down_list)
{
const struct thermal_trip *trip = &td->trip;

@@ -382,8 +385,7 @@ static void handle_thermal_trip(struct t
* the threshold and the trip temperature will be equal.
*/
if (tz->temperature >= trip->temperature) {
- thermal_notify_tz_trip_up(tz, trip);
- thermal_debug_tz_trip_up(tz, trip);
+ list_add_tail(&td->notify_list_node, way_up_list);
td->threshold = trip->temperature - trip->hysteresis;
} else {
td->threshold = trip->temperature;
@@ -400,8 +402,7 @@ static void handle_thermal_trip(struct t
* the trip.
*/
if (tz->temperature < trip->temperature - trip->hysteresis) {
- thermal_notify_tz_trip_down(tz, trip);
- thermal_debug_tz_trip_down(tz, trip);
+ list_add(&td->notify_list_node, way_down_list);
td->threshold = trip->temperature;
} else {
td->threshold = trip->temperature - trip->hysteresis;
@@ -457,10 +458,24 @@ static void thermal_zone_device_init(str
pos->initialized = false;
}

+static int thermal_trip_notify_cmp(void *ascending, const struct list_head *a,
+ const struct list_head *b)
+{
+ struct thermal_trip_desc *tda = container_of(a, struct thermal_trip_desc,
+ notify_list_node);
+ struct thermal_trip_desc *tdb = container_of(b, struct thermal_trip_desc,
+ notify_list_node);
+ int ret = tdb->threshold - tda->threshold;
+
+ return ascending ? ret : -ret;
+}
+
void __thermal_zone_device_update(struct thermal_zone_device *tz,
enum thermal_notify_event event)
{
struct thermal_trip_desc *td;
+ LIST_HEAD(way_down_list);
+ LIST_HEAD(way_up_list);

if (tz->suspended)
return;
@@ -475,7 +490,19 @@ void __thermal_zone_device_update(struct
tz->notify_event = event;

for_each_trip_desc(tz, td)
- handle_thermal_trip(tz, td);
+ handle_thermal_trip(tz, td, &way_up_list, &way_down_list);
+
+ list_sort((void *)true, &way_up_list, thermal_trip_notify_cmp);
+ list_for_each_entry(td, &way_up_list, notify_list_node) {
+ thermal_notify_tz_trip_up(tz, &td->trip);
+ thermal_debug_tz_trip_up(tz, &td->trip);
+ }
+
+ list_sort(NULL, &way_down_list, thermal_trip_notify_cmp);
+ list_for_each_entry(td, &way_down_list, notify_list_node) {
+ thermal_notify_tz_trip_down(tz, &td->trip);
+ thermal_debug_tz_trip_down(tz, &td->trip);
+ }

monitor_thermal_zone(tz);
}
Index: linux-pm/drivers/thermal/thermal_core.h
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.h
+++ linux-pm/drivers/thermal/thermal_core.h
@@ -17,6 +17,7 @@

struct thermal_trip_desc {
struct thermal_trip trip;
+ struct list_head notify_list_node;
int threshold;
};