Re: [PATCH 1/3] thermal/drivers/intel: Use generic trip points for quark_dts

From: Rafael J. Wysocki
Date: Thu Feb 02 2023 - 05:32:31 EST


On Wed, Feb 1, 2023 at 8:27 PM Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
>
> On 01/02/2023 19:47, Rafael J. Wysocki wrote:
> > On Wed, Feb 1, 2023 at 11:42 AM Daniel Lezcano
> > <daniel.lezcano@xxxxxxxxxx> wrote:
> >>
> >> On 31/01/2023 20:11, Rafael J. Wysocki wrote:
> >>> On Tue, Jan 31, 2023 at 5:41 PM Daniel Lezcano
> >>> <daniel.lezcano@xxxxxxxxxx> wrote:
> >>>>
> >>>> On 26/01/2023 15:15, Rafael J. Wysocki wrote:
> >>>>> On Wed, Jan 18, 2023 at 7:16 PM Daniel Lezcano
> >>>>> <daniel.lezcano@xxxxxxxxxx> wrote:
> >>>>>>
> >>>>>> The thermal framework gives the possibility to register the trip
> >>>>>> points with the thermal zone. When that is done, no get_trip_* ops are
> >>>>>> needed and they can be removed.
> >>>>>>
> >>>>>> Convert ops content logic into generic trip points and register them with the
> >>>>>> thermal zone.
> >>>>>>
> >>>>>> Signed-off-by: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
> >>>>>> ---
> >>>>
> >>>> [ ... ]
> >>>>
> >>>>>> - aux_entry->tzone = thermal_zone_device_register("quark_dts",
> >>>>>> - QRK_MAX_DTS_TRIPS,
> >>>>>> - wr_mask,
> >>>>>> - aux_entry, &tzone_ops, NULL, 0, polling_delay);
> >>>>>> + err = get_trip_temp(QRK_DTS_ID_TP_CRITICAL, &temperature);
> >>>>>> + if (err)
> >>>>>> + goto err_ret;
> >>>>>> +
> >>>>>> + aux_entry->trips[QRK_DTS_ID_TP_CRITICAL].temperature = temperature;
> >>>>>> + aux_entry->trips[QRK_DTS_ID_TP_CRITICAL].type = THERMAL_TRIP_CRITICAL;
> >>>>>> +
> >>>>>> + err = get_trip_temp(QRK_DTS_ID_TP_HOT, &temperature);
> >>>>>> + if (err)
> >>>>>> + goto err_ret;
> >>>>>
> >>>>> If I'm not mistaken, this won't even try to register the thermal zone
> >>>>> if at least one trip cannot be initialized, but previously it was
> >>>>> registered in that case, but the trips that failed to respond were
> >>>>> disabled.
> >>>>>
> >>>>> This is a change in behavior that would at least need to be documented
> >>>>> in the changelog, but it isn't.
> >>>>>
> >>>>> I'm not sure if it is safe to make even, however.
> >>>>
> >>>> Thanks for catching this.
> >>>>
> >>>> Two solutions:
> >>>>
> >>>> 1. Set the temperature to THERMAL_TEMP_INVALID and change
> >>>> get_thermal_trip() to return -EINVAL or -ERANGE if the temperature is
> >>>> THERMAL_TEMP_INVALID
> >>>>
> >>>> 2. Register only the valid trip points.
> >>>>
> >>>> What would be the preferable way ?
> >>>
> >>> I think that the trip points that are registered currently need to
> >>> still be registered after the change.
> >>>
> >>> Does registering a trip point with the temperature set to
> >>> THERMAL_TEMP_INVALID cause it to be effectively disabled?
> >>
> >> The initial behavior before the changes is:
> >>
> >> The function thermal_zone_device_register() will go through all the trip
> >> points and call thermal_zone_get_trip(), resulting in a call to
> >> ops->get_trip_temp(). If the call fails, the trip point is tagged as
> >> disabled and will stay in this state forever, so discarded in the trip
> >> point crossed detection.
> >>
> >> That does not report an error and the trip point is showed in sysfs but
> >> in a inconsistent state as it is actually disabled. Reading the trip
> >> point will return an error or not, but it is in any case disabled in the
> >> thermal framework. The userspace does not have the information about the
> >> trip point being disabled, so showing it up regardless its state is
> >> pointless and prone to confusion for the userspace.
> >>
> >> IMO, it would be more sane to register the trip points which are
> >> actually valid, so invalid trip points are not showed up and does
> >> prevent extra complexity in the thermal core to handle them.
> >
> > Except when the trip point can be updated to become a valid one later,
> > for example in response to a system configuration change. That can
> > happen to ACPI-provided trip points, for example.
> >
> > I don't think that this is an issue for this particular driver, but
> > the core needs to handle that case anyway.
>
> Yes, but the point is the core code never handled that case.

True.

What I wanted to say, though, is that the core needs to allow
registering trip points with THERMAL_TEMP_INVALID without disabling
them automatically, so they can be updated and used later.

> If the trip point fails when registering the thermal zone (and this is
> not related to our changes), the trip point is added to the disabled
> trips bitmap and then whatever the action to validate the trip point, it
> remains disabled for the thermal framework. There is no action to enable
> it (except I missed something).
>
> > Moreover, there is the case when trip points only become relevant when
> > their temperatures are set via ops->set_trip_temp() and they are
> > THERMAL_TEMP_INVALID initially, which needs to be handled by the core
> > either.
>
> Ok, then I guess the simplest change is to assign THERMAL_TEMP_INVALID
> in this driver, if get_trip_temp fails at the initialization time.
>
> Later we can add a thermal_zone_device_update_trips() with the needed
> locking and actions related to the update.

Well, there is thermal_zone_device_update() and one of the events it
is supposed to handle is THERMAL_TRIP_CHANGED, so I'm not sure how the
new interface would differ from it?