Re: [PATCH v5 1/2] thermal/debugfs: Add thermal cooling device debugfs information

From: Rafael J. Wysocki
Date: Thu Dec 28 2023 - 14:13:27 EST


On Wed, Dec 27, 2023 at 11:46 AM Daniel Lezcano
<daniel.lezcano@xxxxxxxxxx> wrote:
>
> The thermal framework does not have any debug information except a
> sysfs stat which is a bit controversial. This one allocates big chunks
> of memory for every cooling devices with a high number of states and
> could represent on some systems in production several megabytes of
> memory for just a portion of it. As the sysfs is limited to a page
> size, the output is not exploitable with large data array and gets
> truncated.
>
> The patch provides the same information than sysfs except the
> transitions are dynamically allocated, thus they won't show more
> events than the ones which actually occurred. There is no longer a
> size limitation and it opens the field for more debugging information
> where the debugfs is designed for, not sysfs.
>
> The thermal debugfs directory structure tries to stay consistent with
> the sysfs one but in a very simplified way:
>
> thermal/
> -- cooling_devices
> |-- 0
> | |-- clear
> | |-- time_in_state_ms
> | |-- total_trans
> | `-- trans_table
> |-- 1
> | |-- clear
> | |-- time_in_state_ms
> | |-- total_trans
> | `-- trans_table
> |-- 2
> | |-- clear
> | |-- time_in_state_ms
> | |-- total_trans
> | `-- trans_table
> |-- 3
> | |-- clear
> | |-- time_in_state_ms
> | |-- total_trans
> | `-- trans_table
> `-- 4
> |-- clear
> |-- time_in_state_ms
> |-- total_trans
> `-- trans_table
>
> The content of the files in the cooling devices directory is the same
> as the sysfs one except for the trans_table which has the following
> format:
>
> Transition Hits
> 1->0 246
> 0->1 246
> 2->1 632
> 1->2 632
> 3->2 98
> 2->3 98
>
> Signed-off-by: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
> ---
> Changelog:
> - v5
> - Removed a spurious change in thermal_helper.c, missed to remove it in v4 (Rafael)

This one LGTM now.

I've only spotted a few very minor things that can be fixed up while
applying the patch.

> - v4
> - Fixed kerneldoc description ordering (Rafael)
> - Fixed comment (Rafael)
> - Renamed s/cdev_value/cdev_record/ (Rafael)
> - Used union instead of a common 'value' field in cdev_record (Rafael)
> - Renamed s/cdev/cdev_dbg/ for clarity (Rafael)
> - Renamed s/dfs/thermal_dbg/ for clarity (Rafael)
> - Renamed s/list/lists/ in the place where there are array of lists (Rafael)
> - Inverted initialization logic when allocating a cdev_record (Rafael)
> - Moved now = ktime_get() closer to the place where it is used (Rafael)
> - Moved two lines down to a condition (Rafael)
> - Removed strange and unexpected addition of function (Rafael)
> - v3
> - Fixed kerneldoc description (kbuild)
> - Made some functions static
> - v2
> - Added parameter names to fix kbuild report
> - Renamed 'reset' to 'clear' to avoid confusion (Rafael)
> - Fixed several typos and rephrased some sentences (Rafael)
> - Renamed structure field name s/list/node/ (Rafael)
> - Documented structures and exported functions (Rafael)
> - s/trans_list/transitions/ (Rafael)
> - s/duration_list/durations/ (Rafael)
> - Folded 'alloc' and 'insert' into a single function (Rafael)
> - s/list/lists/ as it is an array of lists (Rafael)
> - s/pos/entry/ (Rafael)
> - Introduced a locking in the 'clear' callback function (Rafael)
> - s/to/new_state/ and s/from/old_state/ (Rafael)
> - Added some comments in thermal_debug_cdev_transition() (Rafael)
> - Explained why char[11] (Rafael)
> - s/Hits/Occurrences/ (Rafael)
> - s/Time/Residency/ (Rafael)
> - Constified structure pointer passed to thermal_debug_cdev_transition()
> - s/thermal_debug_cdev_transition()/thermal_debug_cdev_state_update()/
> - v1 (from RFC):
> - Fixed typo "occurred"
> - Changed Kconfig option name and description
> - Removed comment in the Makefile
> - Renamed exported function name s/debugfs/debug/
> - Replaced thermal_debug_cdev_[unregister|register] by [add|remove]
> ---