Re: [RFC] perf: need to expose sched_clock to correlate usersamples with kernel samples

From: Pawel Moll
Date: Fri Feb 01 2013 - 09:18:24 EST


Hello,

I'd like to revive the topic...

On Tue, 2012-10-16 at 18:23 +0100, Peter Zijlstra wrote:
> On Tue, 2012-10-16 at 12:13 +0200, Stephane Eranian wrote:
> > Hi,
> >
> > There are many situations where we want to correlate events happening at
> > the user level with samples recorded in the perf_event kernel sampling buffer.
> > For instance, we might want to correlate the call to a function or creation of
> > a file with samples. Similarly, when we want to monitor a JVM with jitted code,
> > we need to be able to correlate jitted code mappings with perf event samples
> > for symbolization.
> >
> > Perf_events allows timestamping of samples with PERF_SAMPLE_TIME.
> > That causes each PERF_RECORD_SAMPLE to include a timestamp
> > generated by calling the local_clock() -> sched_clock_cpu() function.
> >
> > To make correlating user vs. kernel samples easy, we would need to
> > access that sched_clock() functionality. However, none of the existing
> > clock calls permit this at this point. They all return timestamps which are
> > not using the same source and/or offset as sched_clock.
> >
> > I believe a similar issue exists with the ftrace subsystem.
> >
> > The problem needs to be adressed in a portable manner. Solutions
> > based on reading TSC for the user level to reconstruct sched_clock()
> > don't seem appropriate to me.
> >
> > One possibility to address this limitation would be to extend clock_gettime()
> > with a new clock time, e.g., CLOCK_PERF.
> >
> > However, I understand that sched_clock_cpu() provides ordering guarantees only
> > when invoked on the same CPU repeatedly, i.e., it's not globally synchronized.
> > But we already have to deal with this problem when merging samples obtained
> > from different CPU sampling buffer in per-thread mode. So this is not
> > necessarily
> > a showstopper.
> >
> > Alternatives could be to use uprobes but that's less practical to setup.
> >
> > Anyone with better ideas?
>
> You forgot to CC the time people ;-)
>
> I've no problem with adding CLOCK_PERF (or another/better name).
>
> Thomas, John?

I've just faced the same issue - correlating an event in userspace with
data from the perf stream, but to my mind what I want to get is a value
returned by perf_clock() _in the current "session" context_.

Stephane didn't like the idea of opening a "fake" perf descriptor in
order to get the timestamp, but surely one must have the "session"
already running to be interested in such data in the first place? So I
think the ioctl() idea is not out of place here... How about the simple
change below?

Regards

Pawel

8<---