Re: [PATCH 00/29] tracing: 'hist' triggers

From: Tom Zanussi
Date: Fri Feb 12 2016 - 11:17:46 EST


Hi Steve,

On Thu, 2015-12-10 at 12:50 -0600, Tom Zanussi wrote:
> This is v13 of the 'hist triggers' patchset, which is the same as v12

What do you think about merging this? It's been pretty well reviewed
and thoroughly tested. In fact, I had it running my function_hist [1]
tracer over the holidays for a couple weeks and it made it through
literally a trillion hits ;-) :

ip: [ffffffff8122a010] __pollwait hitcount: 6938908762
ip: [ffffffff810dbfb0] add_wait_queue hitcount: 7000101527
ip: [ffffffff810dc150] remove_wait_queue hitcount: 7000101540
ip: [ffffffffa0174ed0] i915_mutex_lock_interruptible hitcount: 7051615201
ip: [ffffffff811089b0] ktime_get hitcount: 7421729276
ip: [ffffffffa0065270] drm_ioctl_permit hitcount: 7510013336
ip: [ffffffffa00655b0] drm_ioctl hitcount: 7510013336
ip: [ffffffff8131e500] avc_has_extended_perms hitcount: 7510112785
ip: [ffffffff812293d0] do_vfs_ioctl hitcount: 7510158218
ip: [ffffffff8131bc80] security_file_ioctl hitcount: 7510158218
ip: [ffffffff81322a20] selinux_file_ioctl hitcount: 7510158218
ip: [ffffffff81229870] SyS_ioctl hitcount: 7510159033
ip: [ffffffff81759cb0] mutex_lock_interruptible hitcount: 8292073702
ip: [ffffffffa0178a30] i915_gem_obj_to_ggtt_view hitcount: 9215375535
ip: [ffffffff8175a2c0] mutex_unlock hitcount: 10087997821
ip: [ffffffff816fa0b0] unix_poll hitcount: 10604309681
ip: [ffffffff81003110] do_audit_syscall_entry hitcount: 12056938450
ip: [ffffffff81138190] __audit_syscall_entry hitcount: 12056938450
ip: [ffffffff810036d0] syscall_trace_enter_phase1 hitcount: 12056938450
ip: [ffffffff81096c2e] syscall_slow_exit_work hitcount: 12056939299
ip: [ffffffff81138290] __audit_syscall_exit hitcount: 12056939299
ip: [ffffffff81136690] unroll_tree_refs hitcount: 12056984860
ip: [ffffffff8121ff70] path_put hitcount: 12107933023
ip: [ffffffff81235b20] mntput hitcount: 12173926118
ip: [ffffffff8122cc50] dput hitcount: 12191510421
ip: [ffffffff81630060] sock_poll hitcount: 12323576556
ip: [ffffffff81107b30] current_kernel_time64 hitcount: 12409041480
ip: [ffffffff81218270] fput hitcount: 12618060750
ip: [ffffffff811f6790] kfree hitcount: 17023681707
ip: [ffffffff810ccc20] __compute_runnable_contrib hitcount: 17226816197
ip: [ffffffffa0166350] gen7_render_get_cmd_length_mask hitcount: 18551173520
ip: [ffffffff817589b0] _cond_resched hitcount: 19242157574
ip: [ffffffff8175bd60] _raw_spin_lock hitcount: 19275222963
ip: [ffffffff81232c60] __fdget hitcount: 22865629762
ip: [ffffffff81232c00] __fget_light hitcount: 24594932641
ip: [ffffffff8175bed0] _raw_spin_lock_irqsave hitcount: 26510496751
ip: [ffffffff8175bb30] _raw_spin_unlock_irqrestore hitcount: 26525916042
ip: [ffffffffa018d6c0] gen6_ring_get_seqno hitcount: 95888137753

Totals:
Hits: 1020596753338
Entries: 7102
Dropped: 0

[1]
http://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto-contrib/commit/?h=tzanussi/func-hist-v0&id=fa123181a068574bd9e300416cc947aa7424d808

Thanks,

Tom

> but incorporates a minor coding suggestion from Namhyung (and drops
> one patch now in tracing/for-next).
>
> Changes from v12:
>
> - Removed the unnecessary return val check and NULL assignment
> pointed out by Namhyung, and did a small cleanup to make the error
> handling in that function (create_hist_data()) more uniform, which
> also included a small change to create_sort_keys().
> - Removed the 'update cond flag' patch from the series since that's
> now in tracing/for-next.
>
> Changes from v11:
>
> - Added tracing_map_lookup().
> - Changed tracing_map_insert() to incorporate the hits/drops counter
> updating. The changes allow clients to continue calling
> tracing_map_insert() and update existing entries if desired, or
> stop on the first drop, since the return value is still NULL in
> that case.
> - Changed the hist trigger code to continue calling
> tracing_map_insert() rather than stopping on the first drop. The
> idea is that there's no harm in continuing after the first drop -
> if the user can't tolerate any drops, they'll still see a non-zero
> drop count and can retry with a larger table, but if the user
> doesn't care, the hit and drop counts are an accurate reflection
> of how many events were tallied vs dropped.
> - Added a new unreg_all() callback to event_command, and code to
> call it when the trigger file is opened for truncate. Implemented
> unreg_all() for hist triggers, which allows them to honor the
> truncate flag while leaving existing triggers intact. This means
> that '>>' should now be used for adding multiple hist triggers as
> well as to clear/pause/cont an existing hist trigger.
> - Included Namhyung's log2 patch.
> - Fixed one of Masami's test cases which now requires using '>>' due
> to the truncate changes.
> - 'cont' and 'clear' no longer create a trigger if one doesn't exist.
> - Rebased to latest trace/for-next.
> - Various other smaller fixes and Documentation updates as noted by
> Namhyung.
>
> Changes from v10:
>
> - Rebased to latest trace/for-next.
> - Added Masami's ftrace/hist triggers kselftest patches.
> - Added Masami's Tested-by: to the series.
>
> Changes from v9:
>
> v10 adds some major new features - 'named' hist triggers and support
> for multiple hist triggers on a single event - basically all of the
> features and suggestions suggested by Masami, except for one: I
> decided after all not to implement the 'pause/continue/clear' commands
> on the hist file. I've gotten so used to the habit of calling up the
> previous trigger-setting command in the command history and simply
> appending a 'pause/cont/clear' command to it (and later removing it by
> prepending a '!') that making the user change the command doesn't seem
> as naturally usable. Masami, if you disagree, let me know and I'll
> reconsider it but at this point I'd rather not make that change.
>
> In addition, I think I've fixed all the other various smaller problems
> mentioned in the previous review, please let me know if I missed any...
>
> - Added support for 'named' hist triggers
> - Added support for multiple hist triggers on a single trace event
> - Added documentation and examples for the above new features
> - Streamlined the README to only include the essentials for hist triggers
> - Fixed the 'hex' modifier for values, which was previously ignored
> - Replaced the kmalloc of large arrays with a simple page-based scheme
> - Fixed the tracing_map_insert() case where jhash returns 0
> - Fixed various other small problems noted along the way
>
> Changes from v8:
>
> Same as v8, but with the RFC patch [ftrace: Add function_hist tracer]
> removed, and rebased to latest trace/for-next.
>
> Changes from v7:
>
> This version refactors the commits as suggested by Masami. There are
> now more commits, but the result should be much more reviewable. The
> ending code is the same as before, modulo a couple minor bug fixes I
> discovered while refactoring and testing.
>
> I've also reviewed and fixed a number of shortcomings and errors in
> the comments, and have added a new discussion of the tracing_map data
> structures after Steve mentioned he found them confusing and/or
> insufficiently documented.
>
> Also, I kept Namhyung's string patch [tracing: Support string type key
> properly] as submitted, but added a follow-on patch that refactors it
> and fixes a problem I found with it that enabled static string keys to
> contain random chars and therefore incorrect map insertions.
>
> Changes from v6:
>
> This version adds a new 'sym-offset' modifier as requested by Masami.
> I implemented it as a modifier rather than using the trace option as
> suggested, in part because I wanted to keep it all self-contained and
> it seemed more consistent to just add it alongside the 'sym' modifier.
> Also, hist triggers arent't really a tracer and therefore don't
> directly tie into the option update/callback mechanism so making use
> of it isn't as simple as a normal tracer.
>
> I also changed the sort key specification to be stricter and signal an
> error if the specified sort key wasn't found (rather than defaulting
> to hitcount in those cases), also suggested by Masami. Thanks,
> Masami, for your input!
>
> Also updated the Documentation and tracing/README to reflect the
> changes.
>
> Changes from v5:
>
> This version adds support for compound keys, along with the related
> ability to sort using primary and secondary keys. This was mentioned
> in previous versions as the last important piece that remained
> unimplemented, and is now implemented. (I didn't have time to get to
> the couple of enhancements suggested by Masami, but I expect to be
> able to add those later on top of these.)
>
> Because we now support compound keys and it's not immediately clear in
> the output exactly which fields correspond to keys, the key(s),
> compound or not, are now enclosed by curly braces.
>
> The Documentation and README have been updated to reflect the changes,
> and several new examples have been added to illustrate how to use
> compound keys.
>
> Also, the code was updated to work with the new ftrace_event_file,
> etc, renaming in tracing/for-next.
>
> Changes from v4:
>
> This version addresses some problems and suggestions made by Daniel
> Wagner - a lot of the code was reworked to get rid of the distinction
> between keys and values, and as a result, both keys and values can be
> used as sort keys. As suggested, it also allows 'val=' to be absent
> in a trigger command - if no 'val' is specified, hitcount is assumed
> and automatically used as the only val.
>
> The map code was also separated out into a separate file,
> tracing_map.c, allowing it to be reused. It also adds a second tracer
> called function_hist that actually does reuse the code, as an RFC
> patch.
>
> Patch 01/10 [tracing: Update cond flag when enabling or disabling..]
> is a fix for a problem noticed by Daniel and that fixes a problem in
> existing trigger code and should be applied regardless of whether the
> rest of the patchset is merged.
>
> As mentioned, patch 10/10 is an RFC patch implementing a new tracer
> based on the function tracer code. It's a fun little tool and is
> useful for a specific problem I'm working on (and is also a nice test
> of the tracing_map code), but is an RFC because first, I'm not sure it
> would really be of general interest and secondly, it's POC-level
> quality and I'd need to spend more time fixing it up to make it
> upstreamable, but I don't want to waste my time if not.
>
> There are a couple of important bits of functionality that were
> present in v1 but not yet reimplemented in v5.
>
> The first is support for compound keys. Currently, maps can only be
> keyed on a single event field, whereas in v1 they could be keyed on
> multiple keys. With support for compound keys, you can create much
> more interesting output, such as for example per-pid lists of
> syscalls or read counts e.g.:
>
> # echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount' > \
> /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
>
> # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
>
> key: common_pid:bash[3112], id:sys_write vals: count:69
> key: common_pid:bash[3112], id:sys_rt_sigprocmask vals: count:218
>
> key: common_pid:update-notifier[3164], id:sys_poll vals: count:37
> key: common_pid:update-notifier[3164], id:sys_recvfrom vals: count:118
>
> key: common_pid:deja-dup-monito[3194], id:sys_sendto vals: count:1
> key: common_pid:deja-dup-monito[3194], id:sys_read vals: count:4
> key: common_pid:deja-dup-monito[3194], id:sys_poll vals: count:8
> key: common_pid:deja-dup-monito[3194], id:sys_recvmsg vals: count:8
> key: common_pid:deja-dup-monito[3194], id:sys_getegid vals: count:8
>
> key: common_pid:emacs[3275], id:sys_fsync vals: count:1
> key: common_pid:emacs[3275], id:sys_open vals: count:1
> key: common_pid:emacs[3275], id:sys_symlink vals: count:2
> key: common_pid:emacs[3275], id:sys_poll vals: count:23
> key: common_pid:emacs[3275], id:sys_select vals: count:23
> key: common_pid:emacs[3275], id:unknown_syscall vals: count:34
> key: common_pid:emacs[3275], id:sys_ioctl vals: count:60
> key: common_pid:emacs[3275], id:sys_rt_sigprocmask vals: count:116
>
> key: common_pid:cat[3323], id:sys_munmap vals: count:1
> key: common_pid:cat[3323], id:sys_fadvise64 vals: count:1
>
> Related to that is support for sorting on multiple fields. Currently,
> you can sort using only a primary key. Being able to sort on multiple
> or at least a secondary key is indispensible for seeing trends when
> displaying multiple values.
>
> Changes from v3:
>
> v4 fixes the race in tracing_map_insert() noted in v3, where
> map.val.key could be checked even if map.val wasn't yet set. The
> simple fix for that in tracing_map_insert() introduces the possibility
> of duplicates in the map, which though rare, need to be accounted for
> in the output. To address that, duplicate-merging code was added to
> the map-printing code.
>
> It was also pointed out that it didn't seem correct to include
> module.h, but the fix for that has deeper roots and is being addressed
> by a separate patchset; for now we need to continue including
> module.h, though prompted by that I did some other header include
> cleanup.
>
> The functionality remains the same as v2, but this version no longer
> tries to export and use bpf_maps, and more importantly removes the
> associated GFP_NOTRACE/trace event hacks and kmem macros required to
> work around the bpf_map implementation.
>
> The tracing_map functionality is instead built on top of a simple
> lock-free map algorithm originated by Dr. Cliff Click (see references
> in the code for more details), which though too restrictive to be
> general-purpose in its current form, functions nicely as a
> special-purpose tracing map.
>
> v3 also moves the hist triggers code into a separate file and puts it
> all behind a new config option, CONFIG_HIST_TRIGGERS. It also merges
> in the sorting code rather than keeping it as a separate patch.
>
> This patchset also includes a couple other new and related triggers,
> enable_hist and disable_hist, very similar to the existing
> enable_event/disable_event triggers used to automatically enable and
> disable events based on a triggering condition, but in this case
> allowing hist triggers to be enabled and disabled in the same way.
>
> - Added an insert check for val before checking the key associated with val
> - Added code to merge possible duplicates in the map
>
> Changes from v2:
> - reimplemented tracing_map, replacing bpf_map with nmi-safe/lock-free map
> - removed GPF_NOTRACE, kmalloc/free macros and event hacks needed by bpf_maps
> - moved hist triggers from trace_events_trigger.c to trace_events_hist.c
> - added CONFIG_HIST_TRIGGERS config option
> - consolidated sorting code with main patch
>
> Changes from v1:
> - completely rewritten on top of tracing_map (renamed and exported bpf_map)
> - added map clearing and client ops to tracing_map
> - changed the name from 'hash' triggers to 'hist' triggers
> - added new trigger 'pause' feature
> - added new enable_hist and disable_hist triggers
> - added usage for hist/enable_hist/disable hist to tracing/README
> - moved examples into Documentation/trace/event.txt
> - added ___GFP_NOTRACE, kmalloc/kfree macros, and conditional kmem tracepoints
>
> The following changes since commit 39daa7b9e89512f234b7fb5d55812a78318251fc:
>
> ftrace: Show all tramps registered to a record on ftrace_bug() (2015-11-25 16:04:59 -0500)
>
> are available in the git repository at:
>
> git://git.yoctoproject.org/linux-yocto-contrib.git tzanussi/hist-triggers-v13
> http://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto-contrib/log/?h=tzanussi/hist-triggers-v13
>
> Masami Hiramatsu (2):
> kselftests/ftrace : Add event trigger testcases
> kselftests/ftrace: Add hist trigger testcases
>
> Namhyung Kim (2):
> tracing: Support string type key properly
> tracing: Add hist trigger 'log2' modifier
>
> Tom Zanussi (25):
> tracing: Make ftrace_event_field checking functions available
> tracing: Make event trigger functions available
> tracing: Add event record param to trigger_ops.func()
> tracing: Add get_syscall_name()
> tracing: Add a per-event-trigger 'paused' field
> tracing: Add needs_rec flag to event triggers
> tracing: Add an unreg_all() callback to trigger commands
> tracing: Add lock-free tracing_map
> tracing: Add 'hist' event trigger command
> tracing: Add hist trigger support for multiple values ('vals=' param)
> tracing: Add hist trigger support for compound keys
> tracing: Add hist trigger support for user-defined sorting ('sort='
> param)
> tracing: Add hist trigger support for pausing and continuing a trace
> tracing: Add hist trigger support for clearing a trace
> tracing: Add hist trigger 'hex' modifier for displaying numeric fields
> tracing: Add hist trigger 'sym' and 'sym-offset' modifiers
> tracing: Add hist trigger 'execname' modifier
> tracing: Add hist trigger 'syscall' modifier
> tracing: Add hist trigger support for stacktraces as keys
> tracing: Remove restriction on string position in hist trigger keys
> tracing: Add enable_hist/disable_hist triggers
> tracing: Add 'hist' trigger Documentation
> tracing: Add support for multiple hist triggers per event
> tracing: Add support for named triggers
> tracing: Add support for named hist triggers
>
> Documentation/trace/events.txt | 1555 ++++++++++++++++++
> include/linux/trace_events.h | 9 +-
> kernel/trace/Kconfig | 27 +
> kernel/trace/Makefile | 2 +
> kernel/trace/trace.c | 57 +
> kernel/trace/trace.h | 106 +-
> kernel/trace/trace_events.c | 4 +
> kernel/trace/trace_events_filter.c | 12 -
> kernel/trace/trace_events_hist.c | 1733 ++++++++++++++++++++
> kernel/trace/trace_events_trigger.c | 299 +++-
> kernel/trace/trace_syscalls.c | 11 +
> kernel/trace/tracing_map.c | 1058 ++++++++++++
> kernel/trace/tracing_map.h | 282 ++++
> tools/testing/selftests/ftrace/test.d/functions | 9 +
> .../ftrace/test.d/trigger/trigger-eventonoff.tc | 64 +
> .../ftrace/test.d/trigger/trigger-filter.tc | 59 +
> .../ftrace/test.d/trigger/trigger-hist-mod.tc | 65 +
> .../ftrace/test.d/trigger/trigger-hist.tc | 83 +
> .../ftrace/test.d/trigger/trigger-multihist.tc | 73 +
> .../ftrace/test.d/trigger/trigger-snapshot.tc | 56 +
> .../ftrace/test.d/trigger/trigger-stacktrace.tc | 53 +
> .../ftrace/test.d/trigger/trigger-traceonoff.tc | 58 +
> 22 files changed, 5592 insertions(+), 83 deletions(-)
> create mode 100644 kernel/trace/trace_events_hist.c
> create mode 100644 kernel/trace/tracing_map.c
> create mode 100644 kernel/trace/tracing_map.h
> create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-eventonoff.tc
> create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-filter.tc
> create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc
> create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-hist.tc
> create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc
> create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-snapshot.tc
> create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-stacktrace.tc
> create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-traceonoff.tc
>