[PATCH v2 00/15] tracing: 'hist' triggers

From: Tom Zanussi
Date: Mon Mar 02 2015 - 11:01:19 EST


This is v2 of my previously posted 'hashtriggers' patchset [1], but
renamed to 'hist triggers' following feedback from v1.

Since then, the kernel has gained a tracing map implementation in the
form of bpf_map, which this patchset makes a bit more generic, exports
and uses (as tracing_map_*, still in the bpf syscall file however).

A large part of the initial hash triggers implementation was devoted
to a map implementation and general-purpose hashing functions, which
have now been subsumed by the bpf maps. I've completely redone the
trigger patches themselves to work on top of tracing_map. The result
is a much simpler and easier-to-review patchset that's able to focus
more directly on the problem at hand.

The new version addresses all the comments from the previous review,
including changing the name from hash->hist, adding separate 'hist'
files for the output, and moving the examples into Documentation.

This patchset also includes a couple other new and related triggers,
enable_hist and disable_hist, very similar to the existing
enable_event/disable_event triggers used to automatically enable and
disable events based on a triggering condition, but in this case
allowing hist triggers to be enabled and disabled in the same way.

The only problem with using the bpf_map implementation for this is
that it uses kmalloc internally, which causes problems when trying to
trace kmalloc itself. I'm guessing the ebpf tracing code would also
share this problem e.g. when using bpf_maps from probes on kmalloc().
This patchset attempts a solution to that problem (by adding a
gfp_flag and changing the kmem memory allocation tracepoints to
conditional variants) for checking for it in for but I'm not sure it's
the best way to address it.

There are a couple of important bits of functionality that were
present in v1 but dropped in v2 mainly because I'm still trying to
figure out the best way to accomplish those things using the bpf_map
implementation.

The first is support for compound keys. Currently, maps can only be
keyed on a single event field, whereas in v1 they could be keyed on
multiple keys. With support for compound keys, you can create much
more interesting output, such as for example per-pid lists of
syscalls or read counts e.g.:

# echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount' > \
/sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger

# cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist

key: common_pid:bash[3112], id:sys_write vals: count:69
key: common_pid:bash[3112], id:sys_rt_sigprocmask vals: count:218

key: common_pid:update-notifier[3164], id:sys_poll vals: count:37
key: common_pid:update-notifier[3164], id:sys_recvfrom vals: count:118

key: common_pid:deja-dup-monito[3194], id:sys_sendto vals: count:1
key: common_pid:deja-dup-monito[3194], id:sys_read vals: count:4
key: common_pid:deja-dup-monito[3194], id:sys_poll vals: count:8
key: common_pid:deja-dup-monito[3194], id:sys_recvmsg vals: count:8
key: common_pid:deja-dup-monito[3194], id:sys_getegid vals: count:8

key: common_pid:emacs[3275], id:sys_fsync vals: count:1
key: common_pid:emacs[3275], id:sys_open vals: count:1
key: common_pid:emacs[3275], id:sys_symlink vals: count:2
key: common_pid:emacs[3275], id:sys_poll vals: count:23
key: common_pid:emacs[3275], id:sys_select vals: count:23
key: common_pid:emacs[3275], id:unknown_syscall vals: count:34
key: common_pid:emacs[3275], id:sys_ioctl vals: count:60
key: common_pid:emacs[3275], id:sys_rt_sigprocmask vals: count:116

key: common_pid:cat[3323], id:sys_munmap vals: count:1
key: common_pid:cat[3323], id:sys_fadvise64 vals: count:1

Related to that is support for sorting on multiple fields. Currently,
you can sort using only a primary key. Being able to sort on multiple
or at least a secondary key is indispensible for seeing trends when
displaying multiple values.

[1] http://thread.gmane.org/gmane.linux.kernel/1673551

Changes from v1:
- completely rewritten on top of tracing_map (renamed and exported bpf_map)
- added map clearing and client ops to tracing_map
- changed the name from 'hash' triggers to 'hist' triggers
- added new trigger 'pause' feature
- added new enable_hist and disable_hist triggers
- added usage for hist/enable_hist/disable hist to tracing/README
- moved examples into Documentation/trace/event.txt
- added ___GFP_NOTRACE, kmalloc/kfree macros, and conditional kmem tracepoints

The following changes since commit 49058038a12cfd9044146a1bf4b286781268d5c9:

ring-buffer: Do not wake up a splice waiter when page is not full (2015-02-24 14:00:41 -0600)

are available in the git repository at:

git://git.yoctoproject.org/linux-yocto-contrib.git tzanussi/hist-triggers-v2
http://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto-contrib/log/?h=tzanussi/hist-triggers-v2

Tom Zanussi (15):
tracing: Make ftrace_event_field checking functions available
tracing: Add event record param to trigger_ops.func()
tracing: Add get_syscall_name()
bpf: Export bpf map functionality as trace_map_*
bpf: Export a map-clearing function
bpf: Add tracing_map client ops
mm: Add ___GFP_NOTRACE
tracing: Make kmem memory allocation tracepoints conditional
tracing: Add kmalloc/kfree macros
bpf: Make tracing_map use kmalloc/kfree_notrace()
tracing: Add a per-event-trigger 'paused' field
tracing: Add 'hist' event trigger command
tracing: Add sorting to hist triggers
tracing: Add enable_hist/disable_hist triggers
tracing: Add 'hist' trigger Documentation

Documentation/trace/events.txt | 870 +++++++++++++++++++++
include/linux/bpf.h | 15 +
include/linux/ftrace_event.h | 9 +-
include/linux/gfp.h | 3 +-
include/linux/slab.h | 61 +-
include/trace/events/kmem.h | 28 +-
kernel/bpf/arraymap.c | 16 +
kernel/bpf/hashtab.c | 39 +-
kernel/bpf/syscall.c | 193 ++++-
kernel/trace/trace.c | 48 ++
kernel/trace/trace.h | 25 +-
kernel/trace/trace_events.c | 3 +
kernel/trace/trace_events_filter.c | 15 +-
kernel/trace/trace_events_trigger.c | 1466 ++++++++++++++++++++++++++++++++++-
kernel/trace/trace_syscalls.c | 11 +
mm/slab.c | 45 +-
mm/slob.c | 45 +-
mm/slub.c | 47 +-
18 files changed, 2795 insertions(+), 144 deletions(-)

--
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/