Re: [PATCH v7] PM: sleep: Expose last succeeded resumed timestamp in sysfs

From: Google
Date: Tue Jan 30 2024 - 09:52:12 EST


Hi Rafael,

Thanks for the comment!

On Thu, 25 Jan 2024 21:19:07 +0100
"Rafael J. Wysocki" <rafael@xxxxxxxxxx> wrote:

> On Thu, Jan 25, 2024 at 1:43 AM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
> >
> > On Mon, 22 Jan 2024 18:08:22 -0800
> > Brian Norris <briannorris@xxxxxxxxxxxx> wrote:
> >
> > > On Fri, Jan 19, 2024 at 1:08 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> > > > On Wed, Jan 17, 2024 at 1:07 AM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
> > > > >
> > > > > Gently ping,
> > > > >
> > > > > I would like to know this is enough or I should add more info/update.
> > > >
> > > > I still am not sure what this is going to be useful for.
> > > >
> > > > Do you have a specific example?
> > >
> > > Since there seems to be some communication gap here, I'll give it a try.
> > >
> > > First, I'll paste the key phrase of its use case from the cover letter:
> > >
> > > "we would like to know how long the resume processes are taken in kernel
> > > and in user-space"
> > >
> > > This is a "system measurement" question, for use in tests (e.g., in a
> > > test lab for CI or for pre-release testing, where we suspend
> > > Chromebooks, wake them back up, and measure how long the wakeup took)
> > > or for user-reported metrics (e.g., similar statistics from real
> > > users' systems, if they've agreed to automatically report usage
> > > statistics, back to Google). We'd like to know how long it takes for a
> > > system to wake up, so we can detect when there are problems that lead
> > > to a slow system-resume experience. The user experience includes both
> > > time spent in the kernel and time spent after user space has thawed
> > > (and is spending time in potentially complex power and display manager
> > > stacks) before a Chromebook's display lights back up.
> >
> > Thanks Brian for explaining, this is correctly explained how we are
> > using this for measuring resume process duration.
> >
> > > If I understand the whole of Masami's work correctly, I believe we're
> > > taking "timestamps parsed out of dmesg" (or potentially out of ftrace,
> > > trace events, etc.) to measure the kernel side, plus "timestamp
> > > provided here in CLOCK_MONOTONIC" and "timestamp determined in our
> > > power/display managers" to measure user space.
> >
> > Yes, I decided to decouple the kernel and user space because the clock
> > subsystem is adjusted when resuming. So for the kernel, we will use
> > local clock (which is not exposed to user space), and use CLOCK_MONOTONIC
> > for the user space.
>
> The problem with this split is that you cannot know how much time
> elapses between the "successful kernel resume time" and the time when
> user space gets to resume.

Hm, let me check why.

>
> As of this patch, the kernel timestamp is taken when the kernel is
> about to thaw user space and some user space tasks may start running
> right away.

Yes. But note that this just indicates the time when the "kernel"
done the resuming process.

>
> Some other tasks, however, will wait for what happens next in the
> kernel (because it is not done with resuming yet) and some of them
> will wait until explicitly asked to resume by the resume process IIUC.

Yeah, those will be just restarted and may wait for a signal to e.g.
re-draw, re-load etc. I think those are a part of user-space resuming.

>
> Your results depend on which tasks participate in the "user
> experience", so to speak. If they are the tasks that wait to be
> kicked by the resume process, the kernel timestamp taken as per the

"kicked by the resume process" means thawing the tasks??

> above is useless for them, because there is quite some stuff that
> happens in the kernel before they will get kicked.

So your point might be, even after the timestamp, some kernel resume
process will affect? Since the system(kernel+user) resuming state is
continuously changing in parallel, it is hard to split the kernel
and user resume process?

>
> Moreover, some tasks will wait for certain device drivers to get ready
> after the rest of the system resumes and that may still take some more
> time after the kernel has returned to the process driving the system
> suspend-resume.

Oh, I thought driver resume will be done in the kernel before thawing.
Of course some wifi/bt etc. will need to be re-connect again. And
I don't care at this moment.

>
> I'm not sure if there is a single point which can be used as a "user
> space resume start" time for every task, which is why I'm not
> convinced about this patch.

Ah, OK. Yeah, that is a good point. I also think that is not useful
for every cases, but we need some reference timestamp when we start
resuming. And I think this is a better point to do. Of course if we
observe some delay, we still need more comprehensive tracing not only
application log, but this is the first step and I think the step we
can not avoid (because we don't expose the local clock to user space)

>
> BTW, there is a utility called sleepgraph that measures the kernel
> part of the system suspend-resume. It does its best to measure it
> very precisely and uses different techniques for that. Also, it is
> included in the kernel source tree. Can you please have a look at it
> and see how much there is in common between it and your tools? Maybe
> there are some interfaces that can be used in common, or maybe it
> could benefit from some interfaces that you are planning to add.

Thanks for the info! Yeah we also have some tracing tools (perfetto,
trace-cmd etc.) for analysis. Note that this timestamp will kick those
analysis tools if we find any delay based on the timestamp. I mean this
will be used for "monitoring" the trend but not for precise "analysis".

Thank you,

--
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>