Re: [PATCH] thermal/core: Correctly free tz->tzp in thermal zone registration error path

From: Chen-Yu Tsai
Date: Mon Jan 08 2024 - 22:46:06 EST


On Tue, Dec 19, 2023 at 11:28 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> On Tue, Dec 19, 2023 at 9:27 AM Chen-Yu Tsai <wenst@xxxxxxxxxxxx> wrote:
> >
> > After commit 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal
> > zone parameters structure"), the core now copies the thermal zone
> > parameters structure, and frees it if an error happens during thermal
> > zone device registration, or upon unregistration of the device.
> >
> > In the error path, if device_register() was called, then `tz` disappears
> > before kfree(tz->tzp) happens, causing a NULL pointer deference crash.
> >
> > In my case, the error path was entered from the sbs power supply driver,
> > which through the power supply core registers a thermal zone *without
> > trip points* for the battery temperature sensor. This combined with
> > setting the default thermal governor to "power allocator", which
> > *requires* trip_max, causes the thermal zone registration to error out.
> >
> > The error path should handle the two cases, one where device_register
> > has not happened and the kobj hasn't been reference counted, and vice
> > versa where it has. The original commit tried to cover the first case,
> > but fails for the second. Fix this by adding kfree(tz->tzp) before
> > put_device() to cover the second case, and check if `tz` is still valid
> > before calling kfree(tz->tzp) to avoid crashing in the second case.
> >
> > Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure")
> > Signed-off-by: Chen-Yu Tsai <wenst@xxxxxxxxxxxx>
> > ---
> > This includes the minimal changes to fix the crash. I suppose some other
> > things in the thermal core could be reworked:
> > - Don't use "power allocator" for thermal zones without trip points
> > - Move some of the thermal zone cleanup code into the release function
> >
> > drivers/thermal/thermal_core.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> > index 2415dc50c31d..e47826d82062 100644
> > --- a/drivers/thermal/thermal_core.c
> > +++ b/drivers/thermal/thermal_core.c
> > @@ -1392,12 +1392,16 @@ thermal_zone_device_register_with_trips(const char *type, struct thermal_trip *t
> > unregister:
> > device_del(&tz->device);
> > release_device:
> > + /* Free tz->tzp before tz goes away. */
> > + kfree(tz->tzp);
> > put_device(&tz->device);
> > tz = NULL;
> > remove_id:
> > ida_free(&thermal_tz_ida, id);
> > free_tzp:
> > - kfree(tz->tzp);
> > + /* If we arrived here before device_register() was called. */
> > + if (tz)
> > + kfree(tz->tzp);
> > free_tz:
> > kfree(tz);
> > return ERR_PTR(result);
> > --
>
> Can you please test linux-next from today? The issue addressed by
> your patch should be fixed there.

Sorry for the very late reply. Yes it does. Thanks.

ChenYu