Re: [RFC v3 0/4] Make it easier to measure % in HW sleep state

From: Hans de Goede
Date: Thu Nov 17 2022 - 09:11:36 EST


Hi,

On 11/15/22 21:01, Mario Limonciello wrote:
> Sven van Ashbrook brought a patch to the kernel mailing list that
> attempted to change the reporting level of a s0ix entry issue to a
> different debugging level so that infastructure used by Google could
> better scan logs to catch problems.
>
> This approach was rejected, but during the conversation another
> suggestion was made by David E. Box to introduce some infrastructure
> into the kernel to report this information.
>
> As it's information that is reported by both AMD and Intel platforms
> over s2idle, this seems to make sense.
>
> RFC v1 and v2 introduced two new sysfs files to report the information, but
> Rafael pointed out that there was already a file that could be used on
> Intel platforms: `low_power_idle_system_residency_us`.
>
> RFC v3 creates this file for AMD platforms and also introduces another file
> that can be used to determine total sleep time:
> `/sys/power/suspend_stats/last_total`.
>
> With these two files a simple shell script can be run after suspend to
> calculate the percentage.
>
> ```
> #!/bin/sh
> total=$(cat /sys/power/suspend_stats/last_total)
> hw=$(cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us)
> percent=$(awk -v hw=$hw -v total=$total 'BEGIN { printf "%.2f%%", (hw/total*100) }')
> echo "Last ${total}us suspend cycle spent $percent of the time in a hardware sleep state."
> ```
>
> A sample run on an AMD platform that was just sleeping with this series on
> top of 6.1-rc5 shows the following:
> # ./compare.sh
> Last 15699838us suspend cycle spent 98.63% of the time in a hardware sleep state.
>
> Further discussion to be expected on this series:
>
> * What last_total will represent from the suspend cycle
>
> * Whether the semantics of all platforms will be the same for
> `low_power_idle_system_residency_us`
> - AMD platforms reset this counter before s2idle entry. Do Intel? Others?
>
> * Maybe the *kernel* should be responsible to do the calculation and export
> a `last_hw_sleep_percent` file instead. Platform differences can be
> abstracted then within individual drivers.

That (`last_hw_sleep_percent` file) is an interesting proposal,
I can see that being a better interface because as you say this allows
the kernel / platform-drivers to take care of any platform quirks /
weirdness, avoiding any userspace monitoring of this to possibly
give false positive warnings.

Regards,

Hans





>
> Mario Limonciello (4):
> PM: Add a sysfs file to represent the total sleep duration
> platform/x86/intel/pmc: core: Drop check_counters
> platform/x86/amd: pmc: Report duration of time in deepest hw state
> platform/x86/amd: pmc: Populate cpuidle sysfs file with hw sleep data
>
> Documentation/ABI/testing/sysfs-amd-pmc | 6 ++++++
> Documentation/ABI/testing/sysfs-power | 8 ++++++++
> drivers/platform/x86/amd/pmc.c | 27 ++++++++++++++++++++++---
> drivers/platform/x86/intel/pmc/core.c | 7 ++-----
> drivers/platform/x86/intel/pmc/core.h | 1 -
> include/linux/suspend.h | 2 ++
> kernel/power/main.c | 15 ++++++++++++++
> kernel/power/suspend.c | 2 ++
> kernel/time/timekeeping.c | 2 ++
> 9 files changed, 61 insertions(+), 9 deletions(-)
>