Re: [PATCH V4] thermal/core/power_allocator: reset thermal governor when trip point is changed

From: Di Shen
Date: Sun Jun 25 2023 - 04:40:25 EST


On Sat, Jun 24, 2023 at 1:54 AM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
>
>
>
> On 6/23/23 17:55, Rafael J. Wysocki wrote:
> > On Fri, Jun 23, 2023 at 9:43 AM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
> >>
> >>
> >>
>
> [snip]
>
> >>
> >> I agree, the patch header doesn't explain that properly. Here is the
> >> explanation for this Intelligent Power Allocator (IPA):
> >>
> >> The IPA controls temperature using PID mechanism. It's a closed
> >> feedback loop. That algorithm can 'learn' from the 'observed'
> >> in the past reaction for it's control decisions and accumulates that
> >> information in the part called 'error integral'. Those accumulated
> >> 'error' gaps are the differences between the set target value and the
> >> actually achieved value. In our case the target value is the target
> >> temperature which is coming from the trip point. That part is then used
> >> with the 'I' (of PID) component, so we can compensate for those
> >> 'learned' mistakes.
> >> Now, when you change the target temperature value - all your previous
> >> learned errors won't help you. That's why Intelligent Power Allocator
> >> should reset previously accumulated history.
> >
> > Right.
> >
> > And every other governor using information from the past for control
> > will have an analogous problem, won't it?
>
> Not necessarily, but to play safe I would go case-by-case and make
> sure other governors are aligned to this new feature.
>
> E.g. the bang-bang governor operates only on current temperature and
> current trip value + trip hysteresis. The flow graph describes it [1].
> The control (state of the fan: ON or OFF) of that governor could be
> simply adjusted to the new reality -> new trip point temp. That would
> just mean 'toggling' the fan if needed. There are only 2 'target'
> states: 0 or 1 for the fan. You can images a situation when the
> temperature doesn't change, but we manipulate the trip value for that
> governor. The governor would react correctly always in such situation
> w/o a need of a reset IMO.
>
> >
> >>>
> >>>>>
> >>>>>> For the 2nd case IIUC the code, we pass the 'trip.temperature'
> >>>>>> and should be ready for what you said (modification of that value).
> >>>>>
> >>>>> Generally speaking, it needs to be prepared for a simultaneous change
> >>>>> of multiple trip points (including active), in which case it may not
> >>>>> be useful to invoke the ->reset() callback for each of them
> >>>>> individually.
> >>>>
> >>>> Although, that looks more cleaner IMO. Resetting one by one in
> >>>> a temperature order would be easily maintainable, won't be?
> >>>
> >>> I wouldn't call it maintainable really.
> >>>
> >>> First of all, the trips may not be ordered. There are no guarantees
> >>> whatsoever that they will be ordered, so the caller may have to
> >>> determine the temperature order in the first place. This would be an
> >>> extra requirement that currently is not there.
> >>>
> >>> Apart from this, I don't see any fundamental difference between the
> >>> case when trip points are updated via sysfs and when they are updated
> >>> by the driver. The governor should reset itself in any of those cases
> >>> and even if one trip point changes, the temperature order of all of
> >>> them may change, so the governor reset mechanism should be able to
> >>> handle the case when multiple trip points are updated at the same
> >>> time. While you may argue that this is theoretical, the ACPI spec
> >>> clearly states that this is allowed to happen, for example.
> >>>
> >>> If you want a generic reset callback for governors, that's fine, but
> >>> make it generic and not specific to a particular use case.
> >>
> >> I think we agree here, but probably having slightly different
> >> implementation in mind. Based on you explanation I think you
> >> want simply this API:
> >> void (*reset)(struct thermal_zone_device *tz);
> >>
> >> 1. no return value
> >> 2. no specific trip ID as argument
> >>
> >> Do you agree?
> >
> > Yes, I do.
>
> OK, thanks.
>
> Di could you implement that 'reset()' API according to this description,
> please?
>
Yes, happy to do that.

> >
> >> IMO such implementation and API would also work for this IPA
> >> purpose. Would that work for the ACPI use case as well?
> >
> > It would AFAICS.
>
> Thanks Rafael for the comments and the progress that we made :)
>
> Regards,
> Lukasz
>
> [1]
> https://elixir.bootlin.com/linux/v6.3/source/drivers/thermal/gov_bang_bang.c#L80

Thanks Lukas and Rafeal for the comments. I will send the next version later.

Best regards,
Di