Re: [RFC v2 1/3] PM: Add a sysfs files to represent sleep duration

From: Limonciello, Mario
Date: Tue Nov 15 2022 - 12:59:07 EST


On 11/15/2022 11:52, Rafael J. Wysocki wrote:
On Tue, Nov 15, 2022 at 6:27 PM Limonciello, Mario
<mario.limonciello@xxxxxxx> wrote:

On 11/15/2022 11:20, Raul Rangel wrote:


On Tue, Nov 15, 2022 at 9:35 AM Rafael J. Wysocki <rafael@xxxxxxxxxx
<mailto:rafael@xxxxxxxxxx>> wrote:

On Tue, Nov 15, 2022 at 4:17 PM Limonciello, Mario
<mario.limonciello@xxxxxxx <mailto:mario.limonciello@xxxxxxx>> wrote:
>
> On 11/15/2022 08:45, Rafael J. Wysocki wrote:
> > On Thu, Nov 10, 2022 at 7:49 AM Mario Limonciello
> > <mario.limonciello@xxxxxxx <mailto:mario.limonciello@xxxxxxx>>
wrote:
> >>
> >> Both AMD and Intel SoCs have a concept of reporting whether
the hardware
> >> reached a hardware sleep state over s2idle as well as how much
> >> time was spent in such a state.
> >>
> >> This information is valuable to both chip designers and system
designers
> >> as it helps to identify when there are problems with power
consumption
> >> over an s2idle cycle.
> >>
> >> To make the information discoverable, create a new sysfs file
and a symbol
> >> that drivers from supported manufacturers can use to advertise
this
> >> information. This file will only be exported when the system
supports low
> >> power idle in the ACPI table.
> >>
> >> In order to effectively use this information you will ideally
want to
> >> compare against the total duration of sleep, so export a
second sysfs file
> >> that will show total time. This file will be exported on all
systems and
> >> used both for s2idle and s3.
> >
> > Well, my first question would be how this is related to
> >
> > /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
> >
>
> This has a dependency on the platform firmware offering an ACPI LPIT
> table. I don't know how common that is.

Required for running Windows with Modern Standby AFAICS.

> As this series started from the needs on ChromeOS I would ask is
that typically populated by coreboot?

It should be, but I'd need to ask for confirmation.


It looks like Intel platforms have support for the LPIT table:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsource.chromium.org%2Fchromiumos%2Fchromiumos%2Fcodesearch%2F%2B%2Fmain%3Asrc%2Fthird_party%2Fcoreboot%2Fsrc%2Fsoc%2Fintel%2Fcommon%2Fblock%2Facpi%2Flpit.c%3Fq%3Df%3ALPIT%2520f%3Acoreboot%26ss%3Dchromiumos&amp;data=05%7C01%7Cmario.limonciello%40amd.com%7C37e6dda56f924fe641f008dac7323c01%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638041315852648377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=PusftlebIMFtbaMy1XkBjHFMXLjdOzt7hA%2Fm3AM7v7A%3D&amp;reserved=0 <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsource.chromium.org%2Fchromiumos%2Fchromiumos%2Fcodesearch%2F%2B%2Fmain%3Asrc%2Fthird_party%2Fcoreboot%2Fsrc%2Fsoc%2Fintel%2Fcommon%2Fblock%2Facpi%2Flpit.c%3Fq%3Df%3ALPIT%2520f%3Acoreboot%26ss%3Dchromiumos&amp;data=05%7C01%7Cmario.limonciello%40amd.com%7C37e6dda56f924fe641f008dac7323c01%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638041315852648377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=PusftlebIMFtbaMy1XkBjHFMXLjdOzt7hA%2Fm3AM7v7A%3D&amp;reserved=0>

For AMD, we had some patches to add _LPIL
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freview.coreboot.org%2Fc%2Fcoreboot%2F%2B%2F52381%2F1&amp;data=05%7C01%7Cmario.limonciello%40amd.com%7C37e6dda56f924fe641f008dac7323c01%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638041315852648377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=gUYdMWZBNVALF8Xzhgswlyw9hCUv7LQ6eomz6gfIYrk%3D&amp;reserved=0
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freview.coreboot.org%2Fc%2Fcoreboot%2F%2B%2F52381%2F1&amp;data=05%7C01%7Cmario.limonciello%40amd.com%7C37e6dda56f924fe641f008dac7323c01%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638041315852648377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=gUYdMWZBNVALF8Xzhgswlyw9hCUv7LQ6eomz6gfIYrk%3D&amp;reserved=0>
They never got merged though. We could add an LPIT table to coreboot for
AMD platforms if necessary.

_LPI I don't think makes a lot of sense on X86 today, which is why this
was sent up:
eb087f305919e ("ACPI: processor idle: Check for architectural support
for LPI")

Well, LPI has nothing to do with LPIT. [I guess this could not be
even more confusing, but that's what you get in the world of 4-letter
TLAs.]

As for LPIT - I've never seen LPIT on AMD UEFI systems either. I guess
it's an Intel specific table?

It used to be. The spec is UEFI-hosted now.


Got it.


> I would hope it's the same number that is populated in that file on
> supported systems though.

Well, which is exactly where I'm going.

Since there is one sysfs file for exposing this value already and it
is used (for example, by sleepgraph), perhaps the way to go would be
to extend this interface to systems that don't have LPIT instead of
introducing a new one possibly exposing the same value?


Ah; so since Raul confirmed coreboot on Chrome exports that maybe we
just need to add another way to populate that sysfs file for systems
without LPIT (IE AMD). I think that's a very good idea; thanks.

I think we still probably want to have a way to get the total suspend
time out programmatically though to compare to. So perhaps the other
sysfs file I had in the RFC v2 makes sense still.

Well there are trace points to get that (sleepgraph uses these too),
see Documentation/trace/events-power.rst (and you can git grep for
"machine_suspend" to find where this comes from).

I guess there could be a sysfs file in addition to them, but I'm not
sure if the extra overhead would be worth the benefit.

At least the way that I envisioned this all working was that userspace software that wanted to could query some sysfs files and figure out a percentage of time spent. If it was below a threshold users could be notified, or logs can be sent up to a server for analysis etc.

Trace points would mean that userspace software like systemd and powerd would need to turn on the tracing every time to get the raw total numbers to do such a comparison.


> > and
> >
> > /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
> >
>
> No relation to this one for what's in the series.
>
> >> Suggested-by: David E Box <david.e.box@xxxxxxxxx
<mailto:david.e.box@xxxxxxxxx>>
> >> Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx
<mailto:mario.limonciello@xxxxxxx>>
> >> ---
> >> Documentation/ABI/testing/sysfs-power | 17 +++++++++++
> >> include/linux/suspend.h | 4 +++
> >> kernel/power/main.c | 42
+++++++++++++++++++++++++++
> >> kernel/power/suspend.c | 2 ++
> >> kernel/time/timekeeping.c | 2 ++
> >> 5 files changed, 67 insertions(+)
> >>
> >> diff --git a/Documentation/ABI/testing/sysfs-power
b/Documentation/ABI/testing/sysfs-power
> >> index f99d433ff311..5b47cbb4dc9e 100644
> >> --- a/Documentation/ABI/testing/sysfs-power
> >> +++ b/Documentation/ABI/testing/sysfs-power
> >> @@ -413,6 +413,23 @@ Description:
> >> The /sys/power/suspend_stats/last_failed_step
file contains
> >> the last failed step in the suspend/resume path.
> >>
> >> +What: /sys/power/suspend_stats/last_hw_state_residency
> >> +Date: December 2022
> >> +Contact: Mario Limonciello <mario.limonciello@xxxxxxx
<mailto:mario.limonciello@xxxxxxx>>
> >> +Description:
> >> + The
/sys/power/suspend_stats/last_hw_state_residency file contains
> >> + the amount of time spent in a hardware sleep
state.
> >> + This attribute is only available if the system
supports
> >> + low power idle. This is measured in microseconds.
> >> +
> >> +What: /sys/power/suspend_stats/last_suspend_total
> >> +Date: December 2022
> >> +Contact: Mario Limonciello <mario.limonciello@xxxxxxx
<mailto:mario.limonciello@xxxxxxx>>
> >> +Description:
> >> + The
/sys/power/suspend_stats/last_suspend_total file contains
> >> + the total duration of the sleep cycle.
> >> + This is measured in microseconds.
> >> +
> >> What: /sys/power/sync_on_suspend
> >> Date: October 2019
> >> Contact: Jonas Meurer <jonas@xxxxxxxxxxxxxxx
<mailto:jonas@xxxxxxxxxxxxxxx>>
> >> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> >> index cfe19a028918..af343c3f8198 100644
> >> --- a/include/linux/suspend.h
> >> +++ b/include/linux/suspend.h
> >> @@ -68,6 +68,8 @@ struct suspend_stats {
> >> int last_failed_errno;
> >> int errno[REC_FAILED_NUM];
> >> int last_failed_step;
> >> + u64 last_hw_state_residency;
> >> + u64 last_suspend_total;
> >> enum suspend_stat_step failed_steps[REC_FAILED_NUM];
> >> };
> >>
> >> @@ -489,6 +491,8 @@ void restore_processor_state(void);
> >> extern int register_pm_notifier(struct notifier_block *nb);
> >> extern int unregister_pm_notifier(struct notifier_block *nb);
> >> extern void ksys_sync_helper(void);
> >> +extern void pm_set_hw_state_residency(u64 duration);
> >> +extern void pm_account_suspend_type(const struct timespec64 *t);
> >>
> >> #define pm_notifier(fn, pri) { \
> >> static struct notifier_block fn##_nb = \
> >> diff --git a/kernel/power/main.c b/kernel/power/main.c
> >> index 31ec4a9b9d70..11bd658583b0 100644
> >> --- a/kernel/power/main.c
> >> +++ b/kernel/power/main.c
> >> @@ -6,6 +6,7 @@
> >> * Copyright (c) 2003 Open Source Development Lab
> >> */
> >>
> >> +#include <linux/acpi.h>
> >> #include <linux/export.h>
> >> #include <linux/kobject.h>
> >> #include <linux/string.h>
> >> @@ -54,6 +55,19 @@ void unlock_system_sleep(unsigned int flags)
> >> }
> >> EXPORT_SYMBOL_GPL(unlock_system_sleep);
> >>
> >> +void pm_set_hw_state_residency(u64 duration)
> >> +{
> >> + suspend_stats.last_hw_state_residency = duration;
> >> +}
> >> +EXPORT_SYMBOL_GPL(pm_set_hw_state_residency);
> >> +
> >> +void pm_account_suspend_type(const struct timespec64 *t)
> >> +{
> >> + suspend_stats.last_suspend_total += (s64)t->tv_sec *
USEC_PER_SEC +
> >> + t->tv_nsec /
NSEC_PER_USEC;
> >> +}
> >> +EXPORT_SYMBOL_GPL(pm_account_suspend_type);
> >> +
> >> void ksys_sync_helper(void)
> >> {
> >> ktime_t start;
> >> @@ -377,6 +391,20 @@ static ssize_t
last_failed_step_show(struct kobject *kobj,
> >> }
> >> static struct kobj_attribute last_failed_step =
__ATTR_RO(last_failed_step);
> >>
> >> +static ssize_t last_hw_state_residency_show(struct kobject *kobj,
> >> + struct kobj_attribute *attr, char *buf)
> >> +{
> >> + return sprintf(buf, "%llu\n",
suspend_stats.last_hw_state_residency);
> >> +}
> >> +static struct kobj_attribute last_hw_state_residency =
__ATTR_RO(last_hw_state_residency);
> >> +
> >> +static ssize_t last_suspend_total_show(struct kobject *kobj,
> >> + struct kobj_attribute *attr, char *buf)
> >> +{
> >> + return sprintf(buf, "%llu\n",
suspend_stats.last_suspend_total);
> >> +}
> >> +static struct kobj_attribute last_suspend_total =
__ATTR_RO(last_suspend_total);
> >> +
> >> static struct attribute *suspend_attrs[] = {
> >> &success.attr,
> >> &fail.attr,
> >> @@ -391,12 +419,26 @@ static struct attribute *suspend_attrs[] = {
> >> &last_failed_dev.attr,
> >> &last_failed_errno.attr,
> >> &last_failed_step.attr,
> >> + &last_hw_state_residency.attr,
> >> + &last_suspend_total.attr,
> >> NULL,
> >> };
> >>
> >> +static umode_t suspend_attr_is_visible(struct kobject *kobj,
struct attribute *attr, int idx)
> >> +{
> >> + if (attr != &last_hw_state_residency.attr)
> >> + return 0444;
> >> +#ifdef CONFIG_ACPI
> >> + if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)
> >> + return 0444;
> >> +#endif
> >> + return 0;
> >> +}
> >> +
> >> static const struct attribute_group suspend_attr_group = {
> >> .name = "suspend_stats",
> >> .attrs = suspend_attrs,
> >> + .is_visible = suspend_attr_is_visible,
> >> };
> >>
> >> #ifdef CONFIG_DEBUG_FS
> >> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> >> index fa3bf161d13f..b6c4a3733212 100644
> >> --- a/kernel/power/suspend.c
> >> +++ b/kernel/power/suspend.c
> >> @@ -423,6 +423,8 @@ static int suspend_enter(suspend_state_t
state, bool *wakeup)
> >> if (suspend_test(TEST_PLATFORM))
> >> goto Platform_wake;
> >>
> >> + suspend_stats.last_suspend_total = 0;
> >> +
> >> if (state == PM_SUSPEND_TO_IDLE) {
> >> s2idle_loop();
> >> goto Platform_wake;
> >> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> >> index f72b9f1de178..e1b356787e53 100644
> >> --- a/kernel/time/timekeeping.c
> >> +++ b/kernel/time/timekeeping.c
> >> @@ -24,6 +24,7 @@
> >> #include <linux/compiler.h>
> >> #include <linux/audit.h>
> >> #include <linux/random.h>
> >> +#include <linux/suspend.h>
> >>
> >> #include "tick-internal.h"
> >> #include "ntp_internal.h"
> >> @@ -1698,6 +1699,7 @@ static void
__timekeeping_inject_sleeptime(struct timekeeper *tk,
> >> tk_set_wall_to_mono(tk,
timespec64_sub(tk->wall_to_monotonic, *delta));
> >> tk_update_sleep_time(tk, timespec64_to_ktime(*delta));
> >> tk_debug_account_sleep_time(delta);
> >> + pm_account_suspend_type(delta);
> >> }
> >>
> >> #if defined(CONFIG_PM_SLEEP) &&
defined(CONFIG_RTC_HCTOSYS_DEVICE)
> >> --
> >> 2.34.1
> >>
>