Re: [PATCH v7 1/4] PM: Add sysfs files to represent time spent in hardware sleep state

From: Hans de Goede
Date: Wed Apr 12 2023 - 04:50:03 EST


Hi all,

On 4/12/23 02:58, Box, David E wrote:
> Hi,
>
> On Tue, 2023-04-11 at 21:49 +0000, Limonciello, Mario wrote:
>> [Public]
>>
>>>
>>> On 4/11/23 23:17, Mario Limonciello wrote:
>>>> Userspace can't easily discover how much of a sleep cycle was spent in a
>>>> hardware sleep state without using kernel tracing and vendor specific
>>>> sysfs
>>>> or debugfs files.
>>>>
>>>> To make this information more discoverable, introduce two new sysfs files:
>>>> 1) The time spent in a hw sleep state for last cycle.
>>>> 2) The time spent in a hw sleep state since the kernel booted
>>>> Both of these files will be present only if the system supports s2idle.
>>>>
>>>> Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
>>>> ---
>>>> v6->v7:
>>>>  * Rename to max_hw_sleep (David E Box)
>>>>  * Drop overflow checks (David E Box)
>>>> v5->v6:
>>>>  * Add total attribute as well
>>>>  * Change text for documentation
>>>>  * Adjust flow of is_visible callback.
>>>>  * If overflow was detected in total attribute return -EOVERFLOW
>>>>  * Rename symbol
>>>>  * Add stub for symbol for builds without CONFIG_PM_SLEEP
>>>> v4->v5:
>>>>  * Provide time in microseconds instead of percent. Userspace can convert
>>>>    this if desirable.
>>>> ---
>>>>  Documentation/ABI/testing/sysfs-power | 24 ++++++++++++++++
>>>>  include/linux/suspend.h               |  5 ++++
>>>>  kernel/power/main.c                   | 40 +++++++++++++++++++++++++++
>>>>  3 files changed, 69 insertions(+)
>>>>
>>>> diff --git a/Documentation/ABI/testing/sysfs-power
>>> b/Documentation/ABI/testing/sysfs-power
>>>> index f99d433ff311..0723b4dadfbe 100644
>>>> --- a/Documentation/ABI/testing/sysfs-power
>>>> +++ b/Documentation/ABI/testing/sysfs-power
>>>> @@ -413,6 +413,30 @@ Description:
>>>>                 The /sys/power/suspend_stats/last_failed_step file
>>>> contains
>>>>                 the last failed step in the suspend/resume path.
>>>>
>>>> +What:          /sys/power/suspend_stats/last_hw_sleep
>>>> +Date:          June 2023
>>>> +Contact:       Mario Limonciello <mario.limonciello@xxxxxxx>
>>>> +Description:
>>>> +               The /sys/power/suspend_stats/last_hw_sleep file
>>>> +               contains the duration of time spent in a hardware sleep
>>>> +               state in the most recent system suspend-resume cycle.
>>>> +               This number is measured in microseconds.
>>>> +
>>>> +               NOTE: Limitations in the size of the hardware counters may
>>>> +               cause this value to be inaccurate in longer sleep cycles.
>>>
>>> Hmm I thought that the plan was to add a separate sysfs attr with
>>> the max time that the hw could represent here, so that userspace
>>> actually know what constitutes a "longer sleep cycle" ?
>>>
>>> That would seem better then such a handwavy comment in the ABI docs?
>>
>> I obviously misunderstood what you were suggesting.
>> I don't believe we have a way to programmatically determine what the hardware
>> Internally uses for it's counter to know this.
>>
>> So it would need to be a table of some sorts that a given system can support
>> such value.  If we do that, we can actually know whether to return an error
>> code
>> like -EOVERFLOW or -EINVAL too if the suspend was too long.
>>
>> I would need Intel guys to share this information though which systems have
>> which size of counters to make this happen.
>
> For Intel all the s0ix counters are 32 bit. If the maximum sleep time is
> reported in microseconds it's just
>
> ((1UL << 32) - 1) * slp_s0_res_counter_step,
>
> where slp_s0_res_counter_step is the platform specific counter granularity in
> microseconds. There are some platform specific tweaks (of course). If you
> provide a function to call, I can write the patch for intel_pmc_core.

FWIW the above plan sounds go to me.

Regards,

Hans





>>>> +What:          /sys/power/suspend_stats/max_hw_sleep
>>>> +Date:          June 2023
>>>> +Contact:       Mario Limonciello <mario.limonciello@xxxxxxx>
>>>> +Description:
>>>> +               The /sys/power/suspend_stats/max_hw_sleep file
>>>> +               contains the aggregate of time spent in a hardware sleep
>>>> +               state since the kernel was booted. This number
>>>> +               is measured in microseconds.
>>>> +
>>>> +               NOTE: Limitations in the size of the hardware counters may
>>>> +               cause this value to be inaccurate in longer sleep cycles.
>>>
>>> Maybe "total_hw_sleep" instead of "max_hw_sleep" ? Also since max to
>>> me sounds like the limit of the longest sleep the hw counters can
>>> register, so that is bit confusing with the discussion about those
>>> limits.
>>
>> total_hw_sleep is actually what was in v6 and max_hw_sleep is what suggested.
>> That's why I got confused about what you guys meant.
>
> Sorry, I meant max_hw_sleep for the additional attribute as Hans mentioned.
>
> David
>
>>
>>>
>>> Regards,
>>>
>>> Hans
>>>
>>>
>>>
>>>> +
>>>>  What:          /sys/power/sync_on_suspend
>>>>  Date:          October 2019
>>>>  Contact:       Jonas Meurer <jonas@xxxxxxxxxxxxxxx>
>>>> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
>>>> index cfe19a028918..819e9677fd10 100644
>>>> --- a/include/linux/suspend.h
>>>> +++ b/include/linux/suspend.h
>>>> @@ -68,6 +68,8 @@ struct suspend_stats {
>>>>         int     last_failed_errno;
>>>>         int     errno[REC_FAILED_NUM];
>>>>         int     last_failed_step;
>>>> +       u64     last_hw_sleep;
>>>> +       u64     max_hw_sleep;
>>>>         enum suspend_stat_step  failed_steps[REC_FAILED_NUM];
>>>>  };
>>>>
>>>> @@ -489,6 +491,7 @@ void restore_processor_state(void);
>>>>  extern int register_pm_notifier(struct notifier_block *nb);
>>>>  extern int unregister_pm_notifier(struct notifier_block *nb);
>>>>  extern void ksys_sync_helper(void);
>>>> +extern void pm_report_hw_sleep_time(u64 t);
>>>>
>>>>  #define pm_notifier(fn, pri) {                         \
>>>>         static struct notifier_block fn##_nb =                  \
>>>> @@ -526,6 +529,8 @@ static inline int unregister_pm_notifier(struct
>>> notifier_block *nb)
>>>>         return 0;
>>>>  }
>>>>
>>>> +static inline void pm_report_hw_sleep_time(u64 t) {};
>>>> +
>>>>  static inline void ksys_sync_helper(void) {}
>>>>
>>>>  #define pm_notifier(fn, pri)   do { (void)(fn); } while (0)
>>>> diff --git a/kernel/power/main.c b/kernel/power/main.c
>>>> index 31ec4a9b9d70..a5546b91ecc9 100644
>>>> --- a/kernel/power/main.c
>>>> +++ b/kernel/power/main.c
>>>> @@ -6,6 +6,7 @@
>>>>   * Copyright (c) 2003 Open Source Development Lab
>>>>   */
>>>>
>>>> +#include <linux/acpi.h>
>>>>  #include <linux/export.h>
>>>>  #include <linux/kobject.h>
>>>>  #include <linux/string.h>
>>>> @@ -83,6 +84,13 @@ int unregister_pm_notifier(struct notifier_block *nb)
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(unregister_pm_notifier);
>>>>
>>>> +void pm_report_hw_sleep_time(u64 t)
>>>> +{
>>>> +       suspend_stats.last_hw_sleep = t;
>>>> +       suspend_stats.max_hw_sleep += t;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(pm_report_hw_sleep_time);
>>>> +
>>>>  int pm_notifier_call_chain_robust(unsigned long val_up, unsigned long
>>> val_down)
>>>>  {
>>>>         int ret;
>>>> @@ -377,6 +385,22 @@ static ssize_t last_failed_step_show(struct kobject
>>> *kobj,
>>>>  }
>>>>  static struct kobj_attribute last_failed_step =
>>> __ATTR_RO(last_failed_step);
>>>>
>>>> +static ssize_t last_hw_sleep_show(struct kobject *kobj,
>>>> +               struct kobj_attribute *attr, char *buf)
>>>> +{
>>>> +       return sysfs_emit(buf, "%llu\n", suspend_stats.last_hw_sleep);
>>>> +}
>>>> +static struct kobj_attribute last_hw_sleep = __ATTR_RO(last_hw_sleep);
>>>> +
>>>> +static ssize_t max_hw_sleep_show(struct kobject *kobj,
>>>> +               struct kobj_attribute *attr, char *buf)
>>>> +{
>>>> +       if (suspend_stats.max_hw_sleep == -EOVERFLOW)
>>>> +               return suspend_stats.max_hw_sleep;
>>>> +       return sysfs_emit(buf, "%llu\n", suspend_stats.max_hw_sleep);
>>>> +}
>>>> +static struct kobj_attribute max_hw_sleep =
>>> __ATTR_RO(max_hw_sleep);
>>>> +
>>>>  static struct attribute *suspend_attrs[] = {
>>>>         &success.attr,
>>>>         &fail.attr,
>>>> @@ -391,12 +415,28 @@ static struct attribute *suspend_attrs[] = {
>>>>         &last_failed_dev.attr,
>>>>         &last_failed_errno.attr,
>>>>         &last_failed_step.attr,
>>>> +       &last_hw_sleep.attr,
>>>> +       &max_hw_sleep.attr,
>>>>         NULL,
>>>>  };
>>>>
>>>> +static umode_t suspend_attr_is_visible(struct kobject *kobj, struct
>>> attribute *attr, int idx)
>>>> +{
>>>> +       if (attr != &last_hw_sleep.attr &&
>>>> +           attr != &max_hw_sleep.attr)
>>>> +               return 0444;
>>>> +
>>>> +#ifdef CONFIG_ACPI
>>>> +       if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)
>>>> +               return 0444;
>>>> +#endif
>>>> +       return 0;
>>>> +}
>>>> +
>>>>  static const struct attribute_group suspend_attr_group = {
>>>>         .name = "suspend_stats",
>>>>         .attrs = suspend_attrs,
>>>> +       .is_visible = suspend_attr_is_visible,
>>>>  };
>>>>
>>>>  #ifdef CONFIG_DEBUG_FS
>