Re: [PATCH 3/3] perf record/report: Process events in order

From: Ian Munsie
Date: Mon Dec 06 2010 - 20:16:03 EST


Hi Thomas,

Excerpts from Thomas Gleixner's message of Tue Dec 07 01:41:22 +1100 2010:
> Does the patch below on top of my previous one work for you ?

Not quite. It does work on a patched kernel, but has some problems on
unpatched kernels and issues with perf report -D.

I set ordered_samples and ordering_requires_timestamps in perf report to
test this (I've included the changes below). Since you don't move the
dump_printfs like my patches, we can't set those flags unconditionally
in perf report, otherwise perf report -D would be completely broken.

That's OK, we can just set it conditionally on !dump_trace and report -D
will then always display events in file order. Personally I think it's
handy to see which order events are being processed in to track down why
samples aren't being attributed, but that's mostly important during
debugging the timestamp ordering, and perhaps not as useful otherwise.


> case PERF_RECORD_HEADER_ATTR:
> - return ops->attr(event, session);
> + /* This updates session->sample_id_all */
> + ret = ops->attr(event, session);
> + if (ops->ordering_requires_timestamps &&
> + ops->ordered_samples && !session->sample_id_all) {
> + session->ordered_samples.next_flush = ULLONG_MAX;
> + flush_sample_queue(session, ops);
> + ops->ordered_samples = false;
> + }

Makes sense. I did something similar in the report layer that I was
about to send when I saw this email, but this way we have a generic
solution for other parts of perf that might want this.
The problem here is that we only get the PERF_RECORD_HEADER_ATTR if perf
record has been piped somewhere, so running perf record <load>; perf
report on an unpatched kernel results in the COMM, MMAP, etc events
being processed last (timestamp -1ULL) and no userspace samples are
attributed at all:

$ ./perf report|cat
# Events: 1K cycles
#
# Overhead Command Shared Object Symbol
# ........ ....... ............. .....................
#
99.42% cachetest [unknown] [.] 0xf75e0ccb
0.45% cachetest [kernel] [k] page_fault
0.06% cachetest [kernel] [k] hpet_next_event
0.06% cachetest [kernel] [k] run_timer_softirq
0.01% cachetest [kernel] [k] flush_signal_handlers
0.00% cachetest [kernel] [k] native_write_msr_safe

> + /*
> + * For all non synthetized events we get the sample data

s/synthetized/synthesized/

> + event__parse_sample(event, session, &sample);
> + if (dump_trace)
> + perf_session__print_tstamp(session, event, &sample);

Moving this here after the dump_printf("%#Lx [%#x]: PERF_RECORD_%s"...
changes the output of perf report -D in a bad way. Changing the spacing
in dump_printf can make up for it, or juggle the code around some more.


How do you want to proceed? At this point either version of the patches
are pretty functionally equivelant. Mine does the perf report -D
reordering as well, but that isn't really necessary to solve the bug.
Either way we only have a few minor fixups left.


If we want to use yours, here's my v3 patch respun on top of them to
enable ordering by timestamps in record & report: