Re: [PATCH v2 1/2] perf: Add sampling of the raw monotonic clock

From: Namhyung Kim
Date: Fri Sep 26 2014 - 02:16:50 EST




Hi Pawel,

On Thu, 25 Sep 2014 11:49:58 +0100, Pawel Moll wrote:
> On Wed, 2014-09-24 at 06:41 +0100, Namhyung Kim wrote:
>> Hi Pawel,
>>
>> On Tue, 23 Sep 2014 18:03:06 +0100, Pawel Moll wrote:
>> > This patch adds an option to sample raw monotonic clock
>> > value with any perf event, with the the aim of allowing
>> > time correlation between data coming from perf and
>> > additional performance-related information generated in
>> > userspace.
>> >
>> > In order to correlate timestamps in perf data stream
>> > with events happening in userspace (be it JITed debug
>> > symbols or hwmon-originating environment data), user
>> > requests a more or less periodic event (sched_switch
>> > trace event of a hrtimer-based cpu-clock being the
>> > most obvious examples) with PERF_SAMPLE_TIME *and*
>> > PERF_SAMPLE_CLOCK_RAW_MONOTONIC and stamps
>> > user-originating data with values obtained from
>> > clock_gettime(CLOCK_MONOTONIC_RAW). Then, during
>> > analysis, one looks at the perf events immediately
>> > preceding and following (in terms of the
>> > clock_raw_monotonic sample) the userspace event and
>> > does simple linear approximation to get the equivalent
>> > perf time.
>> >
>> > perf event user event
>> > -----O--------------+-------------O------> t_mono
>> > : | :
>> > : V :
>> > -----O----------------------------O------> t_perf
>>
>> Probably a dumb question: why not make PERF_SAMPLE_TIME being monotonic
>> clock instead of adding a new PERF_SAMPLE_CLOCK_XXX flag?
>
> It's a valid question. And it was asked before, in this thread:
>
> http://thread.gmane.org/gmane.linux.kernel/1611683
>
> A summary of the answer would be:
>
>> On Wed, 2013-12-11 at 12:07 +0000, Ingo Molnar wrote:
>> > * John Stultz <john.stultz-QSEj5FYQhm4dnm+yROfE0A@xxxxxxxxxxxxxxxx> wrote:
>> >
>> > > [...]
>> > >
>> > > I'd much rather see perf export CLOCK_MONOTONIC_RAW timestamps,
>> > > since that clockid is well defined. [...]
>> >
>> > So the problem with that clock is that it does the following for every
>> > timestamp:
>> >
>> > cycle_now = clock->read(clock);
>> >
>> > ... which is impossibly slow if something like the HPET is used, which
>> > is rather common - so this is a non-starter to timestamp perf events
>> > with. We use the scheduler clock as a reasonable compromise between
>> > scalability and clock globality.

Hmm.. but wouldn't it be up to user? If one suffers from a slow clock
she will use default and fast one. Well, if perf can know it'd be bad
for performance, it can warn users about the issue.


>
> Now, for your other comment:
>
>> Maybe we can
>> add a new ioctl command like PERF_EVENT_IOC_SET_CLOCK so that one can
>> pass a clock id.
>
> Did you mean selecting a time source for PERF_SAMPLE_TIME (so we don't
> need two timestamps in a sample)?

Yes.


> It would be doable, I guess, but what
> if someone *wants* to have sched clock as the timestamps source (because
> it's cheap) but still be able to correlate them with userspace? In this
> case two separate timestamps are required to do the approximation.

But by collecting two timestamps, you'll loose the win of the first
timestamp, no?


>
> Now, if you meant it to select a time source for the "other timestamp",
> let's call it in a more generic way: PERF_SAMPLE_CLOCK_VALUE this could
> work, yes. I see ALSA doing something similar (see
> SNDRV_PCM_TSTAMP_TYPE_* enum in include/uapi/sound/asound.h). One
> question would be: how does it work with groups? Does setting a
> timesource for the leader selects them for all members? I don't think
> you want a bunch of samples with different timestamp sources in the same
> buffer.

My answer would be: if you don't want it, don't do that. :)

Thanks,
Namhyung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/