Re: [PATCH v2] thermal: Fix a NULL pointer dereference

From: Subbaraman Narayanamurthy
Date: Tue Oct 05 2021 - 18:09:47 EST


On 9/17/21 1:06 PM, Subbaraman Narayanamurthy wrote:
> On 9/17/21 2:31 AM, Daniel Lezcano wrote:
>> On 07/09/2021 21:01, Subbaraman Narayanamurthy wrote:
>>> of_parse_thermal_zones() parses the thermal-zones node and registers a
>>> thermal_zone device for each subnode. However, if a thermal zone is
>>> consuming a thermal sensor and that thermal sensor device hasn't probed
>>> yet, an attempt to set trip_point_*_temp for that thermal zone device
>>> can cause a NULL pointer dereference. Fix it.
>>>
>>> console:/sys/class/thermal/thermal_zone87 # echo 120000 > trip_point_0_temp
>>> ...
>>> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
>>> ...
>>> Call trace:
>>> of_thermal_set_trip_temp+0x40/0xc4
>>> trip_point_temp_store+0xc0/0x1dc
>>> dev_attr_store+0x38/0x88
>>> sysfs_kf_write+0x64/0xc0
>>> kernfs_fop_write_iter+0x108/0x1d0
>>> vfs_write+0x2f4/0x368
>>> ksys_write+0x7c/0xec
>>> __arm64_sys_write+0x20/0x30
>>> el0_svc_common.llvm.7279915941325364641+0xbc/0x1bc
>>> do_el0_svc+0x28/0xa0
>>> el0_svc+0x14/0x24
>>> el0_sync_handler+0x88/0xec
>>> el0_sync+0x1c0/0x200
>>>
>>> While at it, fix the possible NULL pointer dereference in other
>>> functions as well: of_thermal_get_temp(), of_thermal_set_emul_temp(),
>>> of_thermal_get_trend().
>>>
>>> Cc: stable@xxxxxxxxxxxxxxx
>>> Suggested-by: David Collins <quic_collinsd@xxxxxxxxxxx>
>>> Signed-off-by: Subbaraman Narayanamurthy <quic_subbaram@xxxxxxxxxxx>
>>> ---
>>> Changes for v2:
>>> - Added checks in of_thermal_get_temp(), of_thermal_set_emul_temp(), of_thermal_get_trend().
>>>
>>> drivers/thermal/thermal_of.c | 9 ++++++---
>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
>>> index 6379f26..9233f7e 100644
>>> --- a/drivers/thermal/thermal_of.c
>>> +++ b/drivers/thermal/thermal_of.c
>>> @@ -89,7 +89,7 @@ static int of_thermal_get_temp(struct thermal_zone_device *tz,
>>> {
>>> struct __thermal_zone *data = tz->devdata;
>>>
>>> - if (!data->ops->get_temp)
>>> + if (!data->ops || !data->ops->get_temp)
>> comment (1)
>>
>> AFAICT, if data->ops != NULL then data->ops->get_temp is also != NULL
>> because of the code allocating/freeing the ops structure.
>>
>> The tests can be replaced by (!data->ops), no ?
> Thanks Daniel for reviewing the patch.
>
> I agree that even if a sensor module is unregistered, that would call
> "thermal_zone_of_sensor_unregister" which would eventually set NULL on
> get_temp() and get_trend() and "tzd->ops" as well.
>
> However, of_thermal_get_temp() is trying to call "data->ops->get_temp"
> which comes from a sensor driver when it registers. There is no
> guarantee that it would be non-NULL right?
>
> Thinking of which, I think having both checks would be valid.

Hi Daniel,
Do you still have any concerns with this change?

Thanks,
Subbaraman

>>> return -EINVAL;
>>>
>>> return data->ops->get_temp(data->sensor_data, temp);
>>> @@ -186,6 +186,9 @@ static int of_thermal_set_emul_temp(struct thermal_zone_device *tz,
>>> {
>>> struct __thermal_zone *data = tz->devdata;
>>>
>>> + if (!data->ops || !data->ops->set_emul_temp)
>>> + return -EINVAL;
>>> +
>> comment (2)
>>
>> The test looks pointless here (I mean both of them).
>>
>> If of_thermal_set_emul_temp() is called it is because the callback was
>> set in thermal_zone_of_add_sensor().
>>
>> This one does:
>>
>> tz->ops = ops;
>>
>> and
>> if (ops->set_emul_temp)
>> tzd->ops->set_emul_temp = of_thermal_set_emul_temp;
>>
>> If I'm not wrong if we are called, then data->ops &&
>> data->ops->set_emul_temp is always true, right ?
>>
> I've not exercised this condition yet. However, the original problem
> we've observed was when thermal HAL was trying to set trip thresholds
> on a thermal zone device for which the sensor device is not probed yet.
> This had happened randomly because of vendor modules taking time to be
> loaded and probed. I don't know if there would be any userspace entity
> that can try to set emulated temperature for a thermal zone even before
> a sensor device is probed.
>
> Without a sensor driver probed, "tz->ops" would not have a valid pointer
> right? So, I think checking for "data->ops" should be good.
>
> Another possibility is, a sensor might not have "set_emul_temp" callback.
> So checking for "ops->set_emul_temp" should be still valid.
>
>>> return data->ops->set_emul_temp(data->sensor_data, temp);
>>> }
>>>
>>> @@ -194,7 +197,7 @@ static int of_thermal_get_trend(struct thermal_zone_device *tz, int trip,
>>> {
>>> struct __thermal_zone *data = tz->devdata;
>>>
>>> - if (!data->ops->get_trend)
>>> + if (!data->ops || !data->ops->get_trend)
>>> return -EINVAL;
>> Same comment as (1)
> of_thermal_get_trend() is trying to call "data->ops->get_trend" which
> comes from a sensor driver when it registers. From what I can see,
> there are lot of drivers which don't pass "get_trend" in their ops.
> So there is no guarantee that it would be non-NULL right?
>
> Thinking of which, I think having both checks would be valid.
>
>>> return data->ops->get_trend(data->sensor_data, trip, trend);
>>> @@ -301,7 +304,7 @@ static int of_thermal_set_trip_temp(struct thermal_zone_device *tz, int trip,
>>> if (trip >= data->ntrips || trip < 0)
>>> return -EDOM;
>>>
>>> - if (data->ops->set_trip_temp) {
>>> + if (data->ops && data->ops->set_trip_temp) {
>> Same comment as (2)
> Without a sensor driver probed, "tz->ops" would not have a valid pointer right?
> So, I think checking for "data->ops" should be good. Another possibility is, a
> sensor device might not have "set_trip_temp" callback but just "set_trips".
>
> So checking for "data->ops->set_trip_temp" might be still valid.
>
>>> int ret;
>>>
>>> ret = data->ops->set_trip_temp(data->sensor_data, trip, temp);
>>>