Hi Lukasz,
On 06/04/2021 14:25, Lukasz Luba wrote:
On 4/6/21 12:24 PM, Daniel Lezcano wrote:
In your proposed code there is 'tz->last_temperature < switch_on_temp'
which then return 0 immediately. So we don't poke the devices.
Ah yes, I see your point.
Often the device tree is setup with an additional trip point to trigger
a stat collection. So I assumed we don't need to feed the pid loop below
this temperature.
What is the goal of sampling at polling time (not passive) ?
Here are the boards with the extra trip point:
- mt8173.dtsi
- hi6220.dtsi
- hi6220.dtsi
- px30.dts
- rk3328.dtsi
- rk3399-gru-kevin.dts
Those have an interrupt mode but do polling also.
See the configuration for:
- sc7180.dtsi (no polling delay and no pre-trip point)
- r8a77990.dtsi
If the IPA needs a sampling, it may be more adequate to separate the
sampling from the polling. So the other governors won't be impacted by
the IPA specific setup, and we do not end up with polling/passive delays
tricks for a governor. The IPA would have its own self-encapsulated
sampling rate and the thermal zone setup won't depend on the governor.
What do you think ?
IMHO having a private timer in the governor creates another complexity
and confusion.
I would say we move the adherence between the thermal core and the IPA
into the governor only :)
Especially, we *have* to call throttle on a governor even if we are not
in the mitigation process.
And from a design POV, it should be the thermal core to be in control of
what is happening, not passively call the different callbacks and expect
them to behave correctly (eg. set tz->passive)
What we have in thermal now is good enough. We have DT support for both
periods so there is need even to write via sysfs:
polling-delay-passive
polling-delay
The device driver developers can rely on this reliable check in the
thermal framework.
I don't agree that IPA forces any specific setup.
Yes, it does IMO.
For instance, on the hi3660, the sensor is able to do interrupt mode, so
to be wake up when the first temperature threshold is reached.
But there is still the polling delay because the governor is IPA in this
case? There is also an additional trip-point0 which is not a target for
a cooling device, just put there to ensure the IPA has enough data when
reaching the second trip point which is the target.
If, for any reason, we want to switch to step_wise, where the interrupt
mode is enough to trigger the mitigation (eg. with passive polling), and
ensure the system is not constantly waking up (and AFAICT even 1s
periodic wake up can have a significant impact on battery life), it
won't be possible because of the device tree.
Moreover, some sensors do not use their interrupt mode because of IPA
setup or they use it incorrectly.
See my concern here ? IPA has an impact on the thermal core and the
sensor, while those should be governor agnostic.
If the thermal is
configured to do the polling of the temp sensor, because maybe there
are no HW interrupts, then there is no other way. The Arm SCMI sensors
were one of them, where we had to send a SCMI request. There was no
notifications/IRQs that temp crossed some trip point. Now it should be
better, but still it depends on vendor's FW implementation if there is
IRQ.
I don't know all the platforms but so far the interrupt mode is largely
supported, to not say in the vast majority.
The reliable polling is not IPA 'feature request'.
We cannot avoid polling in some configurations. Thermal framework
must support this scenario: polling/checking temp sensor even when
the temp is low.
I agree with that, I'm not questioning about removing the polling.
Thus, since framework must check the temp, calling
the governor->throttle() doesn't harm (as I said every 1sec).
Furthermore, the governor interprets what trip point and temperature to
interpret and how to react.
Precisely, that is the reason why I disagree. The thermal core should
switch on/off the mitigation (say cool down / warm up) and the governor
applies its recipe. With polling and sampling tied together it is not
possible to create self-encapsulated components. As a result we have bug
like [1] :/