Re: [RFC PATCHSET take#2] ioblame: IO tracer with origin tracking

From: David Sharp
Date: Wed Jan 11 2012 - 17:46:19 EST


On Wed, Jan 11, 2012 at 9:02 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello, Frederic.
>
> On Wed, Jan 11, 2012 at 03:40:14PM +0100, Frederic Weisbecker wrote:
>> I think this has been asked before. So sorry for asking twice.
>
> I thought Namhyung was primarily asking about stat gathering which is
> chopped now.
>
>> But I'm wondering why the post processing is made from the kernel. Do you think
>> it would be possible to pull that out in userspace. We have some nice scripting
>> framework for post processing of trace events in perf tools for example.
>>
>> If it's not possible please tell us why. We really would like to avoid adding such
>> a big piece of code in the tracing subsystem if possible.
>
> I suppose you're talking about the state tracking by post-processing,
> right?
>
> * ioblame tracks stack trace for each dirtying operation. ÂIf we don't
> Âwant further state tracking in kernel, we would have to exort the
> Âwhole stack trace on each dirtying operation which can be high
> Âfrequency. ÂAlso, is there an efficient way to export variable
> Âlength data via TPs? ÂIf so, it can be somewhat better but still not
> Âvery good.

See __dynamic_array. It imposes a 4-byte overhead to store the offset
and length of data within the trace event.

That said, I'm always very wary of adding large amounts of data to
tracepoints, especially if they are high frequency, as that just leads
to faster ring buffer exhaustion.

>
> * Even if we track dirtying state in userland, when an io is issued,
> Âit needs to be mapped back to the dirtying actions. ÂIf the dirtier
> Âstate is in userland, we have to export all physaddrs of pages in
> Âthe IO so that userland can match them up and clear dirtied states.
> ÂAgain, the same problem.
>
> * As implemented, most of state tracking should be fairly stable and
> Âshouldn't require much modification as code base evolves but it's
> Âstill trying to extract pretty high level semantics from disjoint
> Âevents across multiple layers. ÂIt's reasonable to expect future
> Âchanges would require updates to how those semantics are
> Âestablished. ÂExporting higher level semantics, we don't get tied to
> Âkeeping the relevant raw tracepoints and, more importantly, their
> Âexact interactions stable.
>
> * It isn't trivial but still pretty straight-forward. ÂMost of what it
> Âdoes is abbreviating strack trace to an identifier (which BTW could
> Âbe useful for other tracing purposes and may be worthwhile to
> Âgeneralize) and tracking page and inode dirtiers using those
> Âidentifiers. ÂIt stays mostly out of the way and doesn't noticeably
> Âharm maintainability. ÂIt fits the role of in-kernel tracers -
> Âbuilding information from domain knowledge and states and exporting
> Âto userland in sensible form.
>
> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/