Re: [PATCH 0/2] ftrace: updates to tip

From: Chris Mason
Date: Fri Jan 16 2009 - 16:49:00 EST


On Thu, 2009-01-15 at 19:40 -0500, Steven Rostedt wrote:
> Ingo,
>
> The first patch is critical, and needs to stay with trace_output.c
> Not that critical since trace_output.c is not in mainline yet.
>
> The second patch gives the ability to stack trace functions.
> I've been leery about adding this and still keep it a separate
> option from the "stacktrace" that already exists. This is because
> when enabled with no filtering, the lag between typing and seeing
> what is typed can be up to 10 seconds or more.
>

I mostly asked for this because I often try to find the most common
reason a given function is called, and oprofile isn't always a great way
to catch it. systemtap can do it too if you can get systemtap to work
against your current devel kernel, but there are limits on how much
memory it'll use.

I've attached some simple python code that parses the output of the
function stack tracer, it makes some dumb assumptions about the format
but isn't a bad proof of concept. The first such assumption is that
you're only filtering on a single function.

Here is some example output, trying to find the most common causes of
native_smp_send_reschedule() during a btrfs dbench run.

It relates to the Oracle OLTP thread because oracle heavily uses IPC
semaphores to trigger wakeups of processes as various events finish.
I'd bet that try_to_wakeup is the most common cause of the reschedule
calls there as well.

For btrfs, the btree lock mutexes come back into the profile yet again.
It would be interesting to change the spinning mutex code to look for
spinners and skip the wakeup on unlock, but that's a different thread
entirely.

The short version is: thanks Steve, this is really cool!

12058 hits:
<= check_preempt_wakeup
<= try_to_wake_up
<= wake_up_process
<= __mutex_unlock_slowpath
<= mutex_unlock
<= btrfs_tree_unlock
<= unlock_up
===========

11678 hits:
<= check_preempt_curr_idle
<= try_to_wake_up
<= wake_up_process
<= __mutex_unlock_slowpath
<= mutex_unlock
<= btrfs_tree_unlock
<= unlock_up
===========

1569 hits:
<= check_preempt_wakeup
<= try_to_wake_up
<= default_wake_function
<= autoremove_wake_function
<= wake_bit_function
<= __wake_up_common
<= __wake_up
===========

1505 hits:
<= check_preempt_curr_idle
<= try_to_wake_up
<= default_wake_function
<= autoremove_wake_function
<= wake_bit_function
<= __wake_up_common
<= __wake_up
===========

1377 hits: # the output isn't perfectly parsable, or I'm dumb
<= 0
<= 0
<= 0
<= 0
<= 0
<= 0
<= 0
===========

1194 hits:
<= check_preempt_curr_idle
<= try_to_wake_up
<= wake_up_process
<= btrfs_queue_worker
<= schedule_bio
<= btrfs_map_bio
<= __btrfs_submit_bio_done
===========

1108 hits:
<= check_preempt_curr_idle
<= try_to_wake_up
<= default_wake_function
<= autoremove_wake_function
<= __wake_up_common
<= __wake_up
<= cache_drop_leaf_ref
===========

935 hits:
<= check_preempt_wakeup
<= try_to_wake_up
<= wake_up_process
<= btrfs_queue_worker
<= schedule_bio
<= btrfs_map_bio
<= __btrfs_submit_bio_done
===========

856 hits:
<= check_preempt_curr_idle
<= try_to_wake_up
<= wake_up_process
<= btrfs_queue_worker
<= btrfs_wq_submit_bio
<= btrfs_submit_bio_hook
<= submit_one_bio
===========

502 hits:
<= check_preempt_wakeup
<= try_to_wake_up
<= wake_up_process
<= btrfs_queue_worker
<= btrfs_wq_submit_bio
<= btrfs_submit_bio_hook
<= submit_one_bio
===========

442 hits:
<= check_preempt_curr_idle
<= try_to_wake_up
<= wake_up_process
<= __mutex_unlock_slowpath
<= mutex_unlock
<= btrfs_tree_unlock
<= btrfs_release_path
===========

327 hits:
<= check_preempt_wakeup
<= try_to_wake_up
<= default_wake_function
<= autoremove_wake_function
<= __wake_up_common
<= __wake_up
<= cache_drop_leaf_ref
===========


... output continues for a bit


#!/usr/bin/env python
#
# this is a fairly trivial parser of the ftrace function stack trace output
# the idea is to find the most common stack traces that lead to calling
# given function, even those that don't use much CPU time and won't
# show up in other profilers.
#
# It assumes ftrace output looks like this:
# some text including the function name
# <= func
# <= func2
# <= func3
# some text including the function name
# <= func4
# <= func5
# <= func6
#
# The traces above would generate two histogram entries, one for
# func,func2,func3 and one for func4,func5,fun6
#

import sys, os, signal

stacks = {}

last = []
printed = False

def summary(arg1, arg2):
global printed;

printed = True
count = {}

for lines, c in stacks.items():
count.setdefault(c, []).append(lines + ' ===========\n')

limit = 50
keys = count.keys()
keys.sort()
for x in keys.__reversed__():
print "%d hits: \n%s" % (x, "".join(count[x]))
limit -= 1
if limit <= 0:
break
sys.exit(0)

signal.signal(signal.SIGTERM, summary)
signal.signal(signal.SIGINT, summary)

while True:
try:
l = sys.stdin.readline()
except:
break

if not l:
break

if l[0] == '#':
continue
if not l.startswith(' <='):
if last:
lines = ''.join(last)
val = stacks.get(lines, 0) + 1
stacks[lines] = val
# we don't put the header line into the trace because that might
# have timestamp info in it.
last = []
else:
last.append(l)

if not printed:
summary(1, 2)