[PATCH RFC 1/1] perf,tool: partial callgrap and time support

From: kan . liang
Date: Sun Jun 28 2015 - 21:01:16 EST


From: Kan Liang <kan.liang@xxxxxxxxx>

When multiple events are sampled it may not be needed to collect
callgraphs for all of them. The sample sites are usually nearby, and
it's enough to collect the callgraphs on a reference event (such as
precise cycles or precise instructions). Similarly we also don't need
fine grained time stamps on all events, as it's enough to have time
stamps on the regular reference events.
This patchkit adds the ability to turn off callgraphs and time stamps
per event. This in term can reduce sampling overhead and the size of the
perf.data. Furthermore, it makes collecting back traces and timestamps
possible when PEBS threshold > 1, which significantly reducing the
sampling overhead especially for frequently occurring events
(https://lkml.org/lkml/2015/5/10/196). For example, A slower event with
a larger period collects back traces/timestamps. Other more events run
fast with multi-pebs. The time stamps from the slower events can be used
to order the faster events. Their backtraces can give the user enough
hint to find the right spot.

Here are some examples and test results.

1. Comparing the elapsed time and perf.data size from "kernbench -M -H".

The test command for FULL callgrap and time support.
"perf record -e
'{cpu/cpu-cycles,period=100000/,cpu/instructions,period=20000/p}'
--call-graph fp --time"

The test command for PARTIAL callgrap and time support.
"perf record -e
'{cpu/cpu-cycles,callgraph=1,time=1,period=100000/,
cpu/instructions,callgraph=0,time=0,period=20000/p}'"

The elapsed time for FULL is 24.3 Sec, while for PARTIAL is 16.9 Sec.
The perf.data size for FULL is 22.1 Gb, while for PARTIAL is 12.4 Gb.

2. Comparing the perf.data size and callgraph results.

The test command for FULL callgrap and time support.
"perf record -e
'{cpu/cpu-cycles,period=100000/pp,cpu/instructions,period=20000/p}'
--call-graph fp -- ./tchain_edit"

The test command for PARTIAL callgrap and time support.
"perf record -e
'{cpu/cpu-cycles,callgraph=1,time=1,period=100000/pp,
cpu/instructions,callgraph=0,time=0,period=20000/p}'
-- ./tchain_edit"

The perf.data size for FULL is 43.2 MB, while for PARTIAL is 21.1 MB.
The callgraph is roughly the same.

The callgraph from FULL
# Samples: 87K of event
'cpu/cpu-cycles,callgraph=1,time=1,period=100000/pp'
# Event count (approx.): 8760000000
#
# Children Self Command Shared Object Symbol
# ........ ........ ........... ..................
..........................................
#
99.98% 0.00% tchain_edit libc-2.15.so [.]
__libc_start_main
|
---__libc_start_main

99.97% 0.00% tchain_edit tchain_edit [.] main
|
---main
__libc_start_main

99.97% 0.00% tchain_edit tchain_edit [.] f1
|
---f1
main
__libc_start_main

99.85% 87.01% tchain_edit tchain_edit [.] f3
|
---f3
|
|--99.74%-- f2
| f1
| main
| __libc_start_main
--0.26%-- [...]
99.71% 0.12% tchain_edit tchain_edit [.] f2
|
---f2
f1
main
__libc_start_main

The callgraph from PARTIAL
# Samples: 417K of event
'cpu/instructions,callgraph=0,time=0,period=20000/p'
# Event count (approx.): 8346980000
#
# Children Self Command Shared Object Symbol
# ........ ........ ........... ................
..........................................
#
98.82% 0.00% tchain_edit libc-2.15.so [.]
__libc_start_main
|
---__libc_start_main

98.82% 0.00% tchain_edit tchain_edit [.] main
|
---main
__libc_start_main

98.82% 0.00% tchain_edit tchain_edit [.] f1
|
---f1
main
__libc_start_main

98.82% 98.28% tchain_edit tchain_edit [.] f3
|
---f3
|
|--0.53%-- f2
| f1
| main
| __libc_start_main
|
|--0.01%-- f1
| main
| __libc_start_main
--99.46%-- [...]
97.63% 0.03% tchain_edit tchain_edit [.] f2
|
---f2
f1
main
__libc_start_main

7.13% 0.03% tchain_edit [kernel.vmlinux] [k] do_nmi
|
---do_nmi
end_repeat_nmi
f3
f2
f1
main
__libc_start_main

Signed-off-by: Kan Liang <kan.liang@xxxxxxxxx>
---
tools/perf/Documentation/perf-record.txt | 13 ++++++++
tools/perf/builtin-record.c | 7 ++--
tools/perf/perf.h | 2 ++
tools/perf/util/evsel.c | 55 ++++++++++++++++++++++++++++++--
tools/perf/util/parse-events.c | 33 +++++++++++++++++++
tools/perf/util/parse-events.h | 3 ++
tools/perf/util/parse-events.l | 3 ++
tools/perf/util/parse-options.c | 2 ++
tools/perf/util/parse-options.h | 4 +++
9 files changed, 116 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 9b9d9d0..f945b01 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -45,6 +45,19 @@ OPTIONS
param1 and param2 are defined as formats for the PMU in:
/sys/bus/event_sources/devices/<pmu>/format/*

+ There are also some params which are not defined in .../<pmu>/format/*.
+ These params can be used to set event defaults.
+ Here is a list of the params.
+ - 'period': Set event sampling period
+ - 'callgraph': Disable/enable callgraph. Acceptable values are
+ 1 for FP mode, 2 for dwarf mode, 3 for LBR mode,
+ 0 for disabling callgraph.
+ - 'stack_size': user stack size for dwarf mode
+ - 'time': Disable/enable time stamping. Acceptable values are
+ 1 for enabling time stamping. 0 for disabling time stamping.
+ Note: If user explicitly sets options which conflict with the params,
+ the value set by the params will be overridden.
+
- a hardware breakpoint event in the form of '\mem:addr[/len][:access]'
where addr is the address in memory you want to break in.
Access is the memory access type (read, write, execute) it can
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index de165a1..c270993 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1018,10 +1018,10 @@ struct option __record_options[] = {
record__parse_mmap_pages),
OPT_BOOLEAN(0, "group", &record.opts.group,
"put the counters into a counter group"),
- OPT_CALLBACK_NOOPT('g', NULL, &record.opts,
+ OPT_CALLBACK_NOOPT_SET('g', NULL, &record.opts, &record.opts.callgraph_set,
NULL, "enables call-graph recording" ,
&record_callchain_opt),
- OPT_CALLBACK(0, "call-graph", &record.opts,
+ OPT_CALLBACK_SET(0, "call-graph", &record.opts, &record.opts.callgraph_set,
"mode[,dump_size]", record_callchain_help,
&record_parse_callchain_opt),
OPT_INCR('v', "verbose", &verbose,
@@ -1030,7 +1030,8 @@ struct option __record_options[] = {
OPT_BOOLEAN('s', "stat", &record.opts.inherit_stat,
"per thread counts"),
OPT_BOOLEAN('d', "data", &record.opts.sample_address, "Record the sample addresses"),
- OPT_BOOLEAN('T', "timestamp", &record.opts.sample_time, "Record the sample timestamps"),
+ OPT_BOOLEAN_SET('T', "timestamp", &record.opts.sample_time,
+ &record.opts.sample_time_set, "Sample timestamps"),
OPT_BOOLEAN('P', "period", &record.opts.period, "Record the sample period"),
OPT_BOOLEAN('n', "no-samples", &record.opts.no_samples,
"don't sample"),
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 4a5827ff..9ba02e0 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -51,6 +51,8 @@ struct record_opts {
bool sample_address;
bool sample_weight;
bool sample_time;
+ bool sample_time_set;
+ bool callgraph_set;
bool period;
bool sample_intr_regs;
bool running_time;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 2936b30..017dd7d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -619,10 +619,58 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts)
struct perf_event_attr *attr = &evsel->attr;
int track = evsel->tracking;
bool per_cpu = opts->target.default_per_cpu && !opts->target.per_thread;
+ bool sample_time = opts->sample_time;
+ bool callgraph = callchain_param.enabled;

attr->sample_id_all = perf_missing_features.sample_id_all ? 0 : 1;
attr->inherit = !opts->no_inherit;

+ /*
+ * If user doesn't explicitly set callgraph or time option,
+ * let event attribute decide.
+ */
+ if (!opts->callgraph_set) {
+ if (attr->sample_type & PERF_SAMPLE_CALLCHAIN) {
+ callgraph = true;
+ if (attr->sample_type & PERF_SAMPLE_STACK_USER) {
+ callchain_param.record_mode = CALLCHAIN_DWARF;
+ if (attr->sample_stack_user)
+ callchain_param.dump_size = attr->sample_stack_user;
+ else
+ callchain_param.dump_size = 8192;
+ } else if (attr->sample_type & PERF_SAMPLE_BRANCH_STACK)
+ callchain_param.record_mode = CALLCHAIN_LBR;
+ else
+ callchain_param.record_mode = CALLCHAIN_FP;
+ } else
+ callgraph = false;
+ }
+
+ if (!opts->sample_time_set) {
+ if (attr->sample_type & PERF_SAMPLE_TIME)
+ sample_time = true;
+ else
+ sample_time = false;
+ }
+
+ /*
+ * Event parsing doesn't check the availability
+ * Clear the bit which event parsing may be set.
+ * Let following code check and reset if available
+ *
+ * Also, the sample size may be caculated mistakenly,
+ * because event parsing may set the PERF_SAMPLE_TIME.
+ * Remove the size which add in perf_evsel__init
+ */
+ attr->sample_type &= ~(PERF_SAMPLE_CALLCHAIN |
+ PERF_SAMPLE_STACK_USER |
+ PERF_SAMPLE_BRANCH_STACK);
+
+ if (attr->sample_type & PERF_SAMPLE_TIME) {
+ attr->sample_type &= ~PERF_SAMPLE_TIME;
+ evsel->sample_size -= sizeof(u64);
+ }
+
perf_evsel__set_sample_bit(evsel, IP);
perf_evsel__set_sample_bit(evsel, TID);

@@ -688,7 +736,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts)
if (perf_evsel__is_function_event(evsel))
evsel->attr.exclude_callchain_user = 1;

- if (callchain_param.enabled && !evsel->no_aux_samples)
+ if (callgraph && !evsel->no_aux_samples)
perf_evsel__config_callgraph(evsel, opts);

if (opts->sample_intr_regs) {
@@ -705,13 +753,14 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts)
/*
* When the user explicitely disabled time don't force it here.
*/
- if (opts->sample_time &&
+ if (sample_time &&
(!perf_missing_features.sample_id_all &&
(!opts->no_inherit || target__has_cpu(&opts->target) || per_cpu)))
perf_evsel__set_sample_bit(evsel, TIME);

if (opts->raw_samples && !evsel->no_aux_samples) {
- perf_evsel__set_sample_bit(evsel, TIME);
+ if (sample_time)
+ perf_evsel__set_sample_bit(evsel, TIME);
perf_evsel__set_sample_bit(evsel, RAW);
perf_evsel__set_sample_bit(evsel, CPU);
}
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 09f8d23..40ece53 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -19,6 +19,7 @@
#include "thread_map.h"
#include "cpumap.h"
#include "asm/bug.h"
+#include "callchain.h"

#define MAX_NAME_LEN 100

@@ -598,6 +599,38 @@ do { \
* attr->branch_sample_type = term->val.num;
*/
break;
+ case PARSE_EVENTS__TERM_TYPE_CALLGRAPH:
+ CHECK_TYPE_VAL(NUM);
+ switch (term->val.num) {
+ case CALLCHAIN_FP:
+ attr->sample_type |= PERF_SAMPLE_CALLCHAIN;
+ break;
+ case CALLCHAIN_DWARF:
+ attr->sample_type |= PERF_SAMPLE_CALLCHAIN |
+ PERF_SAMPLE_STACK_USER;
+ break;
+ case CALLCHAIN_LBR:
+ attr->sample_type |= PERF_SAMPLE_CALLCHAIN |
+ PERF_SAMPLE_BRANCH_STACK;
+ break;
+ case CALLCHAIN_NONE:
+ break;
+ default:
+ return -EINVAL;
+ }
+ break;
+ case PARSE_EVENTS__TERM_TYPE_STACKSIZE:
+ CHECK_TYPE_VAL(NUM);
+ attr->sample_stack_user = term->val.num;
+ break;
+ case PARSE_EVENTS__TERM_TYPE_TIME:
+ CHECK_TYPE_VAL(NUM);
+
+ if (term->val.num > 1)
+ return -EINVAL;
+ if (term->val.num == 1)
+ attr->sample_type |= PERF_SAMPLE_TIME;
+ break;
case PARSE_EVENTS__TERM_TYPE_NAME:
CHECK_TYPE_VAL(STR);
break;
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 131f29b..cceceec 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -62,6 +62,9 @@ enum {
PARSE_EVENTS__TERM_TYPE_NAME,
PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD,
PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE,
+ PARSE_EVENTS__TERM_TYPE_CALLGRAPH,
+ PARSE_EVENTS__TERM_TYPE_STACKSIZE,
+ PARSE_EVENTS__TERM_TYPE_TIME,
};

struct parse_events_term {
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 13cef3c..d527eb6 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -183,6 +183,9 @@ config2 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG2); }
name { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NAME); }
period { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
branch_type { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); }
+callgraph { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CALLGRAPH); }
+stack_size { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_STACKSIZE); }
+time { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_TIME); }
, { return ','; }
"/" { BEGIN(INITIAL); return '/'; }
{name_minus} { return str(yyscanner, PE_NAME); }
diff --git a/tools/perf/util/parse-options.c b/tools/perf/util/parse-options.c
index 01626be..064385f 100644
--- a/tools/perf/util/parse-options.c
+++ b/tools/perf/util/parse-options.c
@@ -140,6 +140,8 @@ static int get_value(struct parse_opt_ctx_t *p,
return err;

case OPTION_CALLBACK:
+ if (opt->set)
+ *(bool *)opt->set = true;
if (unset)
return (*opt->callback)(opt, NULL, 1) ? (-1) : 0;
if (opt->flags & PARSE_OPT_NOARG)
diff --git a/tools/perf/util/parse-options.h b/tools/perf/util/parse-options.h
index 367d8b8..2bec32e 100644
--- a/tools/perf/util/parse-options.h
+++ b/tools/perf/util/parse-options.h
@@ -132,8 +132,12 @@ struct option {
{ .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = "time", .help = (h), .callback = parse_opt_approxidate_cb }
#define OPT_CALLBACK(s, l, v, a, h, f) \
{ .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), (a), .help = (h), .callback = (f) }
+#define OPT_CALLBACK_SET(s, l, v, os, a, h, f) \
+ { .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), (a), .help = (h), .callback = (f), .set = check_vtype(os, bool *) }
#define OPT_CALLBACK_NOOPT(s, l, v, a, h, f) \
{ .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), (a), .help = (h), .callback = (f), .flags = PARSE_OPT_NOARG }
+#define OPT_CALLBACK_NOOPT_SET(s, l, v, os, a, h, f) \
+ { .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), (a), .help = (h), .callback = (f), .flags = PARSE_OPT_NOARG, .set = check_vtype(os, bool *) }
#define OPT_CALLBACK_DEFAULT(s, l, v, a, h, f, d) \
{ .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), (a), .help = (h), .callback = (f), .defval = (intptr_t)d, .flags = PARSE_OPT_LASTARG_DEFAULT }
#define OPT_CALLBACK_DEFAULT_NOOPT(s, l, v, a, h, f, d) \
--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/