Re: Bug: power supply - NULL pointer dereference in bq27x00 driver

From: Dr. H. Nikolaus Schaller
Date: Mon May 18 2015 - 09:31:08 EST


Hi Krzysztof,

Am 18.05.2015 um 14:32 schrieb Krzysztof Kozlowski <k.kozlowski@xxxxxxxxxxx>:

> 2015-05-18 18:40 GMT+09:00 Dr. H. Nikolaus Schaller <hns@xxxxxxxxxxxxx>:
>> Hi,
>> we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.
>>
>> It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458
>>
>> The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
>> sets di->bat *after* return. The old code did pass an uninitialized struct pointer.
>>
>> Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
>> uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
>> is properly initialized.
>>
>> I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
>> right before we see the segfault.
>>
>> The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
>> during registration.
>>
>> Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
>> I don’t know what it does to the uevent and if it restores previous operation.
>>
>> It could have been that it was for good purpose that power_supply_register_no_ws() did not return
>> by value, but by reference to the di->bat struct:
>>
>> - ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
>> + di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);
>>
>> So that code called within the context of power_supply_register_no_ws() could already
>> refer to initialized di->bat.
>
> Indeed this issue was not present in previous design however I think
> the architecture was still error-prone. Starting from driver's probe:
> - some_driver_probe()
> - power_supply_register(&psy)
> - device_add()
> - kobject_uevent() - notify userspace
> - power_supply_uevent()
> - power_supply_show_property()
> - power_supply_get_property()
> - some_driver_get_property()
>
> So before probe() ends the power supply core calls driver's
> get_property(). In that time the driver internal structures may not be
> ready yet, because the probe did not end.

Yes, that is indeed a problem. Maybe it did work with the bq27x00 driver
earlier because the call to power_supply_register(&psy) was the last
activity.

> The get_property() could
> access some registers or other core functions which will be ready
> after power_supply_register() call. For example the
> bq27x00_battery_get_property() accesses delayed work which could be
> (by mistake) not yet initialized.
>
> I looked at other dev_uevent implementations and almost all of them do
> not call back the driver.
>
> Of course this does not change that I introduced the issue and I feel
> bad about it.
> I got some ideas for resolving it:
> 1. Fix bq27x00_battery and maybe others so they won't access the 'psy'
> member of its state container. This does not solve the issue from
> architectural point of view - still some uninitialised data may be
> accessed because probe() is in progress. However this would be the
> fastest and the least intrusive.

The problem might be that it fundamentally changes the driver code
architecture. For example one call using di->bat is in

bq27x00_battery_status() {

else if (power_supply_am_i_supplied(di->bat))
status = POWER_SUPPLY_STATUS_NOT_CHARGING;

}

>
> 2. Remove calls to get_property() (and other functions provided by
> driver) from power_supply_uevent(). Unfortunately this may break
> userspace which expects that some data will be present in uevent.
>
> 3. Change the API to the previous convention, which you pointed as a remedy:
> ret = power_supply_register_no_ws(di->dev, &di->bat, psy_desc, &psy_cfg);
> This also won't solve the issue from 1. that uevent will be called
> before probe ends.

Well, we could require that power_supply_register_no_ws() is the last
activity to be done in any_driver_probe().

>
> 4. Block the power_supply_uevent() from calling the get_property()
> functions until device_add() finishes. This would solve this issue but
> first uevent messages (from adding device) won't contain all of device
> data (just like in solution no 2.) and this won't solve other race -
> someone may call sysfs show() on created device nodes and thus access
> the get_property() before probe finishes.

>
> Any ideas?

5. is it possible to delay the call to kobject_uevent() after some_driver_probe()
is finished? E.g. by registering a one-shot delayed work?

There seems to be a shared workqueue (mentioned e.g. in
<http://www.makelinux.net/ldd3/chp-7-sect-6>)
but that is an area of the kernel I am not familiar with.

BR,
Nikolaus

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/