Re: [PATCH v7] PM: sleep: Expose last succeeded resumed timestamp in sysfs

From: Google
Date: Wed Jan 24 2024 - 19:43:35 EST


On Mon, 22 Jan 2024 18:08:22 -0800
Brian Norris <briannorris@xxxxxxxxxxxx> wrote:

> On Fri, Jan 19, 2024 at 1:08 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> > On Wed, Jan 17, 2024 at 1:07 AM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
> > >
> > > Gently ping,
> > >
> > > I would like to know this is enough or I should add more info/update.
> >
> > I still am not sure what this is going to be useful for.
> >
> > Do you have a specific example?
>
> Since there seems to be some communication gap here, I'll give it a try.
>
> First, I'll paste the key phrase of its use case from the cover letter:
>
> "we would like to know how long the resume processes are taken in kernel
> and in user-space"
>
> This is a "system measurement" question, for use in tests (e.g., in a
> test lab for CI or for pre-release testing, where we suspend
> Chromebooks, wake them back up, and measure how long the wakeup took)
> or for user-reported metrics (e.g., similar statistics from real
> users' systems, if they've agreed to automatically report usage
> statistics, back to Google). We'd like to know how long it takes for a
> system to wake up, so we can detect when there are problems that lead
> to a slow system-resume experience. The user experience includes both
> time spent in the kernel and time spent after user space has thawed
> (and is spending time in potentially complex power and display manager
> stacks) before a Chromebook's display lights back up.

Thanks Brian for explaining, this is correctly explained how we are
using this for measuring resume process duration.

> If I understand the whole of Masami's work correctly, I believe we're
> taking "timestamps parsed out of dmesg" (or potentially out of ftrace,
> trace events, etc.) to measure the kernel side, plus "timestamp
> provided here in CLOCK_MONOTONIC" and "timestamp determined in our
> power/display managers" to measure user space.

Yes, I decided to decouple the kernel and user space because the clock
subsystem is adjusted when resuming. So for the kernel, we will use
local clock (which is not exposed to user space), and use CLOCK_MONOTONIC
for the user space.

> Does that make sense? Or are we still missing something "specific" for
> you? I could give code pointers [1], as it's all open source. But I'm
> not sure browsing our metric-collection code would help understanding
> any more than these explanations.

I hope it helps you understand more about this. If you have further
questions, I will be happy to explain.

> (TBH, this all still seems kinda odd to me, since parsing dmesg isn't
> a great way to get machine-readable information. But this at least
> serves to close some gaps in measurement.)

Yeah, if I can add more in the stat, I would like to add another duration
of the kernel resuming as "last_success_resume_duration". Is that smarter
solution? Or maybe we also can use ftrace for kernel things. But anyway,
to measure the user-space things, in user-space, we need a reference point
of start of resuming.

Thank you,

>
> Brian
>
> [1] e.g., https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform2/power_manager/powerd/metrics_collector.cc;l=294;drc=ce8075df179c4f8b2f4e4c4df6978d3df665c4d1


--
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>