Re: [PATCH v3 2/4] hwmon: (lm90) Use hwmon_notify_event()

From: Guenter Roeck
Date: Mon Feb 21 2022 - 11:02:24 EST


On 2/21/22 07:49, Jon Hunter wrote:

On 21/02/2022 15:43, Guenter Roeck wrote:

...

We observed a random null pointer deference crash somewhere in the
thermal core (crash log below is not very helpful) when calling
mutex_lock(). It looks like we get an interrupt when this crash
happens.

Looking at the lm90 driver, per the above, I now see we are calling
hwmon_notify_event() from the lm90 interrupt handler. Looking at
hwmon_notify_event() I see that ...

hwmon_notify_event()
   --> hwmon_thermal_notify()
     --> thermal_zone_device_update()
       --> update_temperature()
         --> mutex_lock()

So although I don't completely understand the crash, it does seem
that we should not be calling hwmon_notify_event() from the
interrupt handler.

As mentioned separately, this is not the problem.

Yes I can see that now.

I think the problem may be that this is not a devicetree system
(or the lm90 devide does not have a devicetree node), but thermal
notification currently only works in such systems because the hwmon
subsystem uses the devicetree registration method. At the same time,
CONFIG_THERMAL_OF is obviously enabled. Unfortunately, the hwmon code
does not bail out in that situation due to another bug.

The platform I see this on does use device-tree and it does have a node for the ti,tmp451 device which uses the lm90 device. This platform uses the device-tree source arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts and the tmp451 node is in arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi.


Interesting. It appears that the call to devm_thermal_zone_of_sensor_register()
in the hwmon core nevertheless returns -ENODEV which is not handled properly
in the hwmon core. I can see a number of reasons for this to happen:
- there is no devicetree node for the lm90 device
- there is no thermal-zones devicetree node
- there is no thermal zone entry in the thermal-zones node which matches
the sensor

We'll have to revert the lm90 changes until this is sorted out.

Guenter