Re: [PATCH v3 2/4] hwmon: (lm90) Use hwmon_notify_event()

From: Dmitry Osipenko
Date: Mon Feb 21 2022 - 11:13:09 EST


21.02.2022 19:02, Guenter Roeck пишет:
> On 2/21/22 07:49, Jon Hunter wrote:
>>
>> On 21/02/2022 15:43, Guenter Roeck wrote:
>>
>> ...
>>
>>>> We observed a random null pointer deference crash somewhere in the
>>>> thermal core (crash log below is not very helpful) when calling
>>>> mutex_lock(). It looks like we get an interrupt when this crash
>>>> happens.
>>>>
>>>> Looking at the lm90 driver, per the above, I now see we are calling
>>>> hwmon_notify_event() from the lm90 interrupt handler. Looking at
>>>> hwmon_notify_event() I see that ...
>>>>
>>>> hwmon_notify_event()
>>>>    --> hwmon_thermal_notify()
>>>>      --> thermal_zone_device_update()
>>>>        --> update_temperature()
>>>>          --> mutex_lock()
>>>>
>>>> So although I don't completely understand the crash, it does seem
>>>> that we should not be calling hwmon_notify_event() from the
>>>> interrupt handler.
>>>>
>>> As mentioned separately, this is not the problem.
>>
>> Yes I can see that now.
>>
>>> I think the problem may be that this is not a devicetree system
>>> (or the lm90 devide does not have a devicetree node), but thermal
>>> notification currently only works in such systems because the hwmon
>>> subsystem uses the devicetree registration method. At the same time,
>>> CONFIG_THERMAL_OF is obviously enabled. Unfortunately, the hwmon code
>>> does not bail out in that situation due to another bug.
>>
>> The platform I see this on does use device-tree and it does have a
>> node for the ti,tmp451 device which uses the lm90 device. This
>> platform uses the device-tree source
>> arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts and the tmp451 node
>> is in arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi.
>>
>
> Interesting. It appears that the call to
> devm_thermal_zone_of_sensor_register()
> in the hwmon core nevertheless returns -ENODEV which is not handled
> properly
> in the hwmon core. I can see a number of reasons for this to happen:
> - there is no devicetree node for the lm90 device
> - there is no thermal-zones devicetree node
> - there is no thermal zone entry in the thermal-zones node which matches
>   the sensor
>
> We'll have to revert the lm90 changes until this is sorted out.

Oh, yeah. Seems there is a problem there and tzd pointer could be
-ENODEV. But it's a hwmon core problem, which apparently existed for a
long time, not the lm90 problem.