Re: [PATCH V4] thermal/core/power_allocator: reset thermal governor when trip point is changed

From: Di Shen
Date: Sun Jun 25 2023 - 04:39:58 EST


On Fri, Jun 23, 2023 at 4:10 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
>
>
>
> On 6/22/23 19:27, Rafael J. Wysocki wrote:
> > On Tue, Jun 20, 2023 at 1:56 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
> >>
> >>
> >>
> >> On 6/20/23 11:39, Rafael J. Wysocki wrote:
> >>> On Tue, Jun 20, 2023 at 12:19 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
> >>>>
> >>>> Hi Rafael,
> >>>>
> >>>>
> >>>> On 6/20/23 11:07, Rafael J. Wysocki wrote:
> >>>>> On Tue, Jun 20, 2023 at 11:46 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> [snip]
>
> >
> > Because this is up to the governor, the core has no clue what to do
> > with the return value from ->reset() and so there should be none.
> >
> > As I said, governors can print whatever diagnostic messages they like
> > in that callback, but returning anything from it to the core is just
> > not useful IMV.
> >
> >> For the rest of the governors - it's up to them what they
> >> report in case non-passive trip is updated...
> >
> > Sure.
> >
> >>>
> >>>> What Di is facing is in the issue under the bucket of
> >>>> 'handle_non_critical_trips()' when the governor just tries to
> >>>> work on stale data - old trip temp.
> >>>
> >>> Well, fair enough, but what about the other governors? Is this
> >>> problem limited to power_allocator?
> >>
> >> IIUC the core fwk code - non of the governors would be needed
> >> to handle the critical/hot trips. For the rest of the trip types
> >> I would say it's up to the governor. In our IPA case this stale
> >> data is used for power budget estimation - quite fundamental
> >> step. Therefore, the reset and start from scratch would make more
> >> sense.
> >>
> >> I think other governors don't try to 'estimate' such
> >> abstract power headroom based on temperature - so probably
> >> they don't have to even implement the 'reset()' callback
> >> (I don't know their logic that well).
> >
> > So there seems to be a claim that IPA is the only governor needing the
> > ->reset() callback, but I have not seen any solid analysis confirming
> > that. It very well may be the case, but then the changelog should
> > clearly explain why this is the case IMO.
>
> I agree, the patch header doesn't explain that properly. Here is the
> explanation for this Intelligent Power Allocator (IPA):
>
> The IPA controls temperature using PID mechanism. It's a closed
> feedback loop. That algorithm can 'learn' from the 'observed'
> in the past reaction for it's control decisions and accumulates that
> information in the part called 'error integral'. Those accumulated
> 'error' gaps are the differences between the set target value and the
> actually achieved value. In our case the target value is the target
> temperature which is coming from the trip point. That part is then used
> with the 'I' (of PID) component, so we can compensate for those
> 'learned' mistakes.
> Now, when you change the target temperature value - all your previous
> learned errors won't help you. That's why Intelligent Power Allocator
> should reset previously accumulated history.
>

Yes, THAT's the point!
Maybe I need to write the commit message in more detail.

> >
> >>>
> >>>> For the 2nd case IIUC the code, we pass the 'trip.temperature'
> >>>> and should be ready for what you said (modification of that value).
> >>>
> >>> Generally speaking, it needs to be prepared for a simultaneous change
> >>> of multiple trip points (including active), in which case it may not
> >>> be useful to invoke the ->reset() callback for each of them
> >>> individually.
> >>
> >> Although, that looks more cleaner IMO. Resetting one by one in
> >> a temperature order would be easily maintainable, won't be?
> >
> > I wouldn't call it maintainable really.
> >
> > First of all, the trips may not be ordered. There are no guarantees
> > whatsoever that they will be ordered, so the caller may have to
> > determine the temperature order in the first place. This would be an
> > extra requirement that currently is not there.
> >
> > Apart from this, I don't see any fundamental difference between the
> > case when trip points are updated via sysfs and when they are updated
> > by the driver. The governor should reset itself in any of those cases
> > and even if one trip point changes, the temperature order of all of
> > them may change, so the governor reset mechanism should be able to
> > handle the case when multiple trip points are updated at the same
> > time. While you may argue that this is theoretical, the ACPI spec
> > clearly states that this is allowed to happen, for example.
> >
> > If you want a generic reset callback for governors, that's fine, but
> > make it generic and not specific to a particular use case.
>
> I think we agree here, but probably having slightly different
> implementation in mind. Based on you explanation I think you
> want simply this API:
> void (*reset)(struct thermal_zone_device *tz);
>
> 1. no return value
> 2. no specific trip ID as argument
>
> Do you agree?
> IMO such implementation and API would also work for this IPA
> purpose. Would that work for the ACPI use case as well?