Re: OOB access on ACPI processor thermal device via sysfs write

From: Takashi Iwai
Date: Thu Apr 02 2020 - 06:09:56 EST


On Thu, 02 Apr 2020 12:03:30 +0200,
Zhang Rui wrote:
>
> On Thu, 2020-04-02 at 11:03 +0200, Takashi Iwai wrote:
> > On Thu, 02 Apr 2020 09:47:50 +0200,
> > Zhang, Rui wrote:
> > >
> > > CC Viresh.
> > >
> > > Yes, I've received it.
> > >
> > > To me, there is not a hard rule that the cooling device max_state
> > > must be static.
> > > We should be able to detect the max_state change and reset the
> > > stats table when necessary.
> > >
> > > I just finished a prototype patch to do so, and will paste it
> > > later.
> >
> > Great, that sounds like a feasible option, indeed.
> >
> >
> Please try the patch below and see if the problem goes away or not.
>
> >From 7b429674a0e1a6226734c8919b876bb57d946b1d Mon Sep 17 00:00:00 2001
> From: Zhang Rui <rui.zhang@xxxxxxxxx>
> Date: Thu, 2 Apr 2020 11:18:44 +0800
> Subject: [RFC PATCH] thermal: update thermal stats table when max cooling
> state changed
>
> The maximum cooling state of a cooling device may be changed at
> runtime. Thus the statistics table must be updated to handle the real
> maximum cooling states supported.
>
> This fixes an OOB issue when updating the statistics of the processor
> cooling device, because it only supports 1 cooling state before cpufreq
> driver loaded.
>
> Fixes: 8ea229511e06 ("thermal: Add cooling device's statistics in sysfs")
> Reported-by: Takashi Iwai <tiwai@xxxxxxx>
> Signed-off-by: Zhang Rui <rui.zhang@xxxxxxxxx>
> ---
> drivers/thermal/thermal_sysfs.c | 38 +++++++++++++++++++++++++++------
> 1 file changed, 31 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index aa99edb4dff7..c69173eb4b24 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -755,6 +755,9 @@ struct cooling_dev_stats {
> unsigned int *trans_table;
> };
>
> +static int cooling_device_stats_table_update(struct thermal_cooling_device *cdev);
> +static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev);
> +
> static void update_time_in_state(struct cooling_dev_stats *stats)
> {
> ktime_t now = ktime_get(), delta;
> @@ -768,8 +771,12 @@ static void update_time_in_state(struct cooling_dev_stats *stats)
> void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
> unsigned long new_state)
> {
> - struct cooling_dev_stats *stats = cdev->stats;
> + struct cooling_dev_stats *stats;
>
> + if (cooling_device_stats_table_update(cdev))
> + return;
> +
> + stats = cdev->stats;
> spin_lock(&stats->lock);
>
> if (stats->state == new_state)
> @@ -904,24 +911,32 @@ static const struct attribute_group cooling_device_stats_attr_group = {
> .name = "stats"
> };
>
> -static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> +static int cooling_device_stats_table_update(struct thermal_cooling_device *cdev)
> {
> struct cooling_dev_stats *stats;
> unsigned long states;
> - int var;
> + int var, ret;
>
> - if (cdev->ops->get_max_state(cdev, &states))
> - return;
> + ret = cdev->ops->get_max_state(cdev, &states);
> + if (ret)
> + return ret;
>
> states++; /* Total number of states is highest state + 1 */
>
> + stats = cdev->stats;
> + if (stats) {
> + if (stats->max_states == states)
> + return 0;
> + else
> + cooling_device_stats_destroy(cdev);
> + }

This looks racy. We may have concurrent accesses and it'll lead to
another access-after-free.

> var = sizeof(*stats);
> var += sizeof(*stats->time_in_state) * states;
> var += sizeof(*stats->trans_table) * states * states;
> -
> stats = kzalloc(var, GFP_KERNEL);
> if (!stats)
> - return;
> + return -ENOMEM;

... and this leaves NULL to cdev->stats. Then a later access to sysfs
would lead to NULL derference.


> stats->time_in_state = (ktime_t *)(stats + 1);
> stats->trans_table = (unsigned int *)(stats->time_in_state + states);
> @@ -930,6 +945,15 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> stats->max_states = states;
>
> spin_lock_init(&stats->lock);

Also we must not re-initialize spinlock here at each resizing.
Rather use spinlock for switching to the new table.


thanks,

Takashi