Re: [RFC PATCH] thermal/core: Fix trip point crossing events ordering

From: Rafael J. Wysocki
Date: Thu Mar 07 2024 - 06:39:27 EST


On Wed, Mar 6, 2024 at 8:32 PM Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
>
> On Wednesday, March 6, 2024 4:55:51 PM CET Daniel Lezcano wrote:
> > On 06/03/2024 16:41, Rafael J. Wysocki wrote:
> > > On Wed, Mar 6, 2024 at 2:16 PM Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
> > >>
> > >> On 06/03/2024 13:53, Rafael J. Wysocki wrote:
> > >>> On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
> > >>>>
> > >>>> On 06/03/2024 13:02, Rafael J. Wysocki wrote:
> > >>>>
> > >>>> [ ... ]
> > >>>>
> > >>>>>> +#define for_each_trip_reverse(__tz, __trip) \
> > >>>>>> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--)
> > >>>>>> +
> > >>>>>> void __thermal_zone_set_trips(struct thermal_zone_device *tz);
> > >>>>>> int thermal_zone_trip_id(const struct thermal_zone_device *tz,
> > >>>>>> const struct thermal_trip *trip);
> > >>>>>> --
> > >>>>>
> > >>>>> Generally speaking, this is a matter of getting alignment on the
> > >>>>> expectations between the kernel and user space.
> > >>>>>
> > >>>>> It looks like user space expects to get the notifications in the order
> > >>>>> of either growing or falling temperatures, depending on the direction
> > >>>>> of the temperature change. Ordering the trips in the kernel is not
> > >>>>> practical, but the notifications can be ordered in principle. Is this
> > >>>>> what you'd like to do?
> > >>>>
> > >>>> Yes
> > >>>>
> > >>>>> Or can user space be bothered with recognizing that it may get the
> > >>>>> notifications for different trips out of order?
> > >>>>
> > >>>> IMO it is a bad information if the trip points events are coming
> > >>>> unordered. The temperature signal is a time related measurements, the
> > >>>> userspace should receive thermal information from this signal in the
> > >>>> right order. It sounds strange to track the temperature signal in the
> > >>>> kernel, then scramble the information, pass it to the userspace and
> > >>>> except it to apply some kind of logic to unscramble it.
> > >>>
> > >>> So the notifications can be ordered before sending them out, as long
> > >>> as they are produced by a single __thermal_zone_device_update() call.
> > >>>
> > >>> I guess you also would like the thermal_debug_tz_trip_up/down() calls
> > >>> to be ordered, wouldn't you?
> > >>
> > >> Right
> > >
> > > I have an idea how to do this, but it is based on a couple of patches
> > > that I've been working on in the meantime.
> > >
> > > Let me post these patches first and then I'll send a prototype patch
> > > addressing this on top of them.
> >
> > That is awesome, thanks !
>
> Anytime!
>
> Now that I've posted this series:
>
> https://lore.kernel.org/linux-pm/4558384.LvFx2qVVIh@kreacher/
>
> I can append the patch below that is based on it.
>
> The idea is really straightforward: Instead of sending the notifications
> and recording the stats right away, create two lists of trips for which
> they need to be send, sort them and then send the notifications etc in
> the right order. I want to avoid explicit memory allocations that can
> fail in principle, which is why lists are used.
>
> The reason why two lists are used is in case the trips are updated and
> that's why they appear to be crossed (which may not depend on the actual
> temperature change).
>
> One caveat is that the lists are sorted by trip thresholds (because they
> are the real values take into account in the code), but user space may
> expect them to be sorted by trip temperatures instead. That can be changed.

I've concluded that it is better to sort by trip temperature, because
the thresholds used here are the new ones (computed after the trips
have been crossed) which may be confusing.

> ---
> drivers/thermal/thermal_core.c | 39 +++++++++++++++++++++++++++++++++------
> drivers/thermal/thermal_core.h | 1 +
> 2 files changed, 34 insertions(+), 6 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -15,6 +15,7 @@
> #include <linux/slab.h>
> #include <linux/kdev_t.h>
> #include <linux/idr.h>
> +#include <linux/list_sort.h>
> #include <linux/thermal.h>
> #include <linux/reboot.h>
> #include <linux/string.h>
> @@ -361,7 +362,9 @@ static void handle_critical_trips(struct
> }
>
> static void handle_thermal_trip(struct thermal_zone_device *tz,
> - struct thermal_trip_desc *td)
> + struct thermal_trip_desc *td,
> + struct list_head *way_up_list,
> + struct list_head *way_down_list)
> {
> const struct thermal_trip *trip = &td->trip;
>
> @@ -382,8 +385,7 @@ static void handle_thermal_trip(struct t
> * the threshold and the trip temperature will be equal.
> */
> if (tz->temperature >= trip->temperature) {
> - thermal_notify_tz_trip_up(tz, trip);
> - thermal_debug_tz_trip_up(tz, trip);
> + list_add_tail(&td->notify_list_node, way_up_list);
> td->threshold = trip->temperature - trip->hysteresis;
> } else {
> td->threshold = trip->temperature;
> @@ -400,8 +402,7 @@ static void handle_thermal_trip(struct t
> * the trip.
> */
> if (tz->temperature < trip->temperature - trip->hysteresis) {
> - thermal_notify_tz_trip_down(tz, trip);
> - thermal_debug_tz_trip_down(tz, trip);
> + list_add(&td->notify_list_node, way_down_list);
> td->threshold = trip->temperature;
> } else {
> td->threshold = trip->temperature - trip->hysteresis;
> @@ -457,10 +458,24 @@ static void thermal_zone_device_init(str
> pos->initialized = false;
> }
>
> +static int thermal_trip_notify_cmp(void *ascending, const struct list_head *a,
> + const struct list_head *b)
> +{
> + struct thermal_trip_desc *tda = container_of(a, struct thermal_trip_desc,
> + notify_list_node);
> + struct thermal_trip_desc *tdb = container_of(b, struct thermal_trip_desc,
> + notify_list_node);
> + int ret = tdb->threshold - tda->threshold;

So this will become

+ int ret = tdb->trip.temperature - tda->trip.temperature;

> +
> + return ascending ? ret : -ret;
> +}
> +
> void __thermal_zone_device_update(struct thermal_zone_device *tz,
> enum thermal_notify_event event)
> {
> struct thermal_trip_desc *td;
> + LIST_HEAD(way_down_list);
> + LIST_HEAD(way_up_list);
>
> if (tz->suspended)
> return;
> @@ -475,7 +490,19 @@ void __thermal_zone_device_update(struct
> tz->notify_event = event;
>
> for_each_trip_desc(tz, td)
> - handle_thermal_trip(tz, td);
> + handle_thermal_trip(tz, td, &way_up_list, &way_down_list);
> +
> + list_sort((void *)true, &way_up_list, thermal_trip_notify_cmp);
> + list_for_each_entry(td, &way_up_list, notify_list_node) {
> + thermal_notify_tz_trip_up(tz, &td->trip);
> + thermal_debug_tz_trip_up(tz, &td->trip);
> + }
> +
> + list_sort(NULL, &way_down_list, thermal_trip_notify_cmp);
> + list_for_each_entry(td, &way_down_list, notify_list_node) {
> + thermal_notify_tz_trip_down(tz, &td->trip);
> + thermal_debug_tz_trip_down(tz, &td->trip);
> + }
>
> monitor_thermal_zone(tz);
> }
> Index: linux-pm/drivers/thermal/thermal_core.h
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.h
> +++ linux-pm/drivers/thermal/thermal_core.h
> @@ -17,6 +17,7 @@
>
> struct thermal_trip_desc {
> struct thermal_trip trip;
> + struct list_head notify_list_node;
> int threshold;
> };
>