Re: [RFC PATCHSET RESEND] ioblame: statistical IO analyzer

From: Namhyung Kim
Date: Fri Jan 06 2012 - 04:00:33 EST


Hi, Tejun

2012-01-06 AM 8:42, Tejun Heo Wrote:
This is re-post. The original posting was on Dec 15th but it was
missing cc to LKML. I got some responses on that thread so didn't
suspect LKML was missing. If you're getting it the second time. My
apologies.

Stuff pointed out in the original thread are...

* Is the quick variant of backtrace gathering really necessary? -
Still need to get performance numbers.

* TRACE_EVENT_CONDITION() can be used in some places. - Will be
updated.

Original message follows. Thanks.

Hello, guys.

Even with blktrace and tracepoints, getting insight into the IOs going
on a system is very challenging. A lot of IO operations happen long
after the action which triggered the IO finished and the overall
asynchronous nature of IO operations make it difficult to trace back
the origin of a given IO.

ioblame is an attempt at providing better visibility into overall IO
behavior. ioblame hooks into various tracepoints and tries to
determine who caused any given IO how and charges the IO accordingly.

On each IO completion, ioblame knows who to charge the IO (task), how
the IO got triggered (stack trace at the point of triggering, be it
page, inode dirtying or direct IO issue) and various information about
the IO itself (offset, size, how long it took and so on). ioblame
collects this information into histograms which is configurable from
userland using debugfs interface.

For example, using ioblame, user can acquire information like "this
task triggered IO with this stack trace on this file with the
following offset distribution".

For more details, please read Documentation/trace/ioblame.txt, which
I'll append to this message too for discussion.

This patchset contains the following 11 patches.

0001-trace_event_filter-factorize-filter-creation.patch
0002-trace_event_filter-add-trace_event_filter_-interface.patch
0003-block-block_bio_complete-tracepoint-was-missing.patch
0004-block-add-req-to-bio_-front-back-_merge-tracepoints.patch
0005-block-abstract-disk-iteration-into-disk_iter.patch
0006-writeback-move-struct-wb_writeback_work-to-writeback.patch
0007-writeback-add-more-tracepoints.patch
0008-block-add-block_touch_buffer-tracepoint.patch
0009-vfs-add-fcheck-tracepoint.patch
0010-stacktrace-implement-save_stack_trace_quick.patch
0011-block-trace-implement-ioblame-IO-statistical-analyze.patch

0001-0002 export trace_event_filter so that ioblame can use it too.

0003 adds back block_bio_complete TP invocation, which got lost
somehow. This probably makes sense as fix patch for 3.2.

0004-0006 update block layer in preparation. 0005 probably makes
sense as a standalone patch too.

0007-0009 add more tracepoints along the IO stack.

0010 adds nimbler backtrace dump function as ioblame dumps stacktrace
extremely frequently.

0011 implements ioblame.

This is still in early stage and I haven't done much performance
analysis yet. Tentative testing shows it adds ~20% CPU overhead when
used on memory backed loopback device.

The patches are on top of mainline (42ebfc61cf "Merge branch
'stable...git/konrad/xen'") and perf/core (74eec26fac "perf tools: Add
ability to synthesize event according to a sample").

It's also available in the following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-ioblame


Very interesting. It should help analyzing and improving IO performance a lot.

BTW, it seems the ioblame based on event tracing feature, so couldn't it be implemented in userspace with the help of the tracepoints and additional information (e.g. intent, ...) you add? The perf can deal with them and extend post-processing capability easily, and also might reduce some kernel jobs, I guess.

Thanks
Namhyung Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/