Re: [PATCH v1 17/33] thermal/drivers/rcar: Switch to new of API

From: Niklas Söderlund
Date: Sun Jul 24 2022 - 18:39:21 EST


Hi Daniel,

I tested your branch, unfortunately with the same result for
rcar_gen3_thermal. Manipulation of emul_temp file do not trigger
actions.

If I on-top of your branch revert:

409ca214f4c6bd5b ("thermal/of: Remove old OF code")
7b43f76d3428227e ("thermal/drivers/rcar: Switch to new of API")

I'm able to 'restore' the behavior where I can change the cooling state
and trigger the critical trip point using emul_temp to shutdown the
board.

As the change in question also effects the rcar_thermal sensor I gave
that a try too. It have no cooling on this system I have so my only
test-case is to write a temperature above the critical trip point to
emul_temp as see if that shutdown the system. And just as with
rcar_gen3_thermal with your series nothing happens while with the two
commits outline above reverted the system shuts down.

echo 110000 > /sys/devices/virtual/thermal/thermal_zone0/emul_temp

If it's any help writing to emul_temp have some effect as the emulated
temperature is read back from the temp sysfs while. For rcar_thermal
where the critical trip point is 95 degrees,

* With this series
# grep . /sys/devices/virtual/thermal/thermal_zone0/trip_point_0_*
/sys/devices/virtual/thermal/thermal_zone0/trip_point_0_hyst:0
/sys/devices/virtual/thermal/thermal_zone0/trip_point_0_temp:95000
/sys/devices/virtual/thermal/thermal_zone0/trip_point_0_type:critical
# cat /sys/devices/virtual/thermal/thermal_zone0/temp
35000
# echo 50000 > /sys/devices/virtual/thermal/thermal_zone0/emul_temp
# cat /sys/devices/virtual/thermal/thermal_zone0/temp
50000
# echo 110000 > /sys/devices/virtual/thermal/thermal_zone0/emul_temp
# cat /sys/devices/virtual/thermal/thermal_zone0/temp
110000
*** system alive ***

* With this series and the two patches reverted or plain v5.19-rc4
# grep . /sys/devices/virtual/thermal/thermal_zone0/trip_point_0_*
/sys/devices/virtual/thermal/thermal_zone0/trip_point_0_hyst:0
/sys/devices/virtual/thermal/thermal_zone0/trip_point_0_temp:95000
/sys/devices/virtual/thermal/thermal_zone0/trip_point_0_type:critical
# cat /sys/devices/virtual/thermal/thermal_zone0/temp
35000
# echo 50000 > /sys/devices/virtual/thermal/thermal_zone0/emul_temp
# cat /sys/devices/virtual/thermal/thermal_zone0/temp
50000
# echo 110000 > /sys/devices/virtual/thermal/thermal_zone0/emul_temp
[ 121.380054] thermal thermal_zone0: cpu-thermal: critical temperature reached, shutting down
[ 121.388482] reboot: HARDWARE PROTECTION shutdown (Temperature too high)
*** system shuts down ***

And to make it more problematic I don't think the lack of action is
limited to the emul_temp interface. With rcar_thermal I lowered the
critical trip point value to 45C and used the cpuburn application to
generate load and raise the temperature.

The result mirrors the findings above, with your branch the system do
not trigger the critical trip point. If I revert the two commits or run
plain v5.19-rc4, once the temperature reaches 45C the critical trip
point kicks in and shuts down the system.

I hope this helps, I'm sorry I can't find the real issue diging in the
core changes. I'm happy to help trying to find the root cause for this
and I think the idea behind the new API is good.

On 2022-07-24 23:11:47 +0200, Daniel Lezcano wrote:
>
> Hi Niklas,
>
> I give another try but failed to reproduce the issue. Perhaps my board has a
> path different from yours.
>
> Thanks for proposing to test the series. I've uploaded the branch here:
>
> https://github.com/dlezcano/linux-thermal
>
>
> On 24/07/2022 21:00, Niklas Söderlund wrote:
> > Hi Daniel,
> >
> > On 2022-07-24 20:27:54 +0200, Daniel Lezcano wrote:
> > > Hi Niklas,
> > >
> > > I tried to reproduce the issue but without success.
> > >
> > > What sensor are you using ?
> > I was using rcar_gen3_thermal.
> >
> > I did my tests starting on v5.19-rc7 and then picked '[PATCH v5 00/12]
> > thermal OF rework' from [1] and finally applied this full series on-top
> > of that. If you have a branch or some specific test you wish me to try
> > I'm happy to so.
> >
> > 1. https://lore.kernel.org/lkml/20220710123512.1714714-1-daniel.lezcano@xxxxxxxxxx/
> >
> > >
> > > On 19/07/2022 11:10, Niklas Söderlund wrote:
> > > > Hi Daniel,
> > > >
> > > > Thanks for your work.
> > > >
> > > > On 2022-07-10 23:24:07 +0200, Daniel Lezcano wrote:
> > > > > The thermal OF code has a new API allowing to migrate the OF
> > > > > initialization to a simpler approach.
> > > > >
> > > > > Use this new API.
> > > > I tested this together with the series it depends on and while
> > > > temperature monitoring seems to work fine it breaks the emul_temp
> > > > interface (/sys/class/thermal/thermal_zone2/emul_temp).
> > > >
> > > > Before this change I can write a temperature to this file and have it
> > > > trigger actions, in my test-case changing the cooling state, which I
> > > > observe in /sys/class/thermal/cooling_device0/cur_state.
> > > >
> > > > Likewise before this change I could trip the critical trip-point that
> > > > would power off the board using the emul_temp interface, this too no
> > > > longer works,
> > > >
> > > > echo 120000 > /sys/class/thermal/thermal_zone2/emul_temp
> > > >
> > > > Is this an intention change of the new API?
> > >
> > >
> > >
>

--
Kind Regards,
Niklas Söderlund