Re: [PATCH 2/3] tracing: export stats of ring buffers to userspace

From: Frederic Weisbecker
Date: Fri May 01 2009 - 08:43:39 EST

Next message: Alan D. Brunelle: "Re: [PATCH] blktrace: swap arg name "from" and "to" of blk_add_trace_remap"
Previous message: Alan D. Brunelle: "Re: Subject: [PATCH] from-sector redundant in trace_block_remap"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Apr 30, 2009 at 11:23:52PM -0400, Steven Rostedt wrote:
>
> On Fri, 1 May 2009, Frederic Weisbecker wrote:
>
> > On Thu, Apr 30, 2009 at 10:22:12PM -0400, Steven Rostedt wrote:
> > > From: Steven Rostedt <srostedt@xxxxxxxxxx>
> > >
> > > This patch adds stats to the ftrace ring buffers:
> > >
> > > # cat /debugfs/tracing/per_cpu/cpu0/stats
> > > entries: 42360
> > > overrun: 30509326
> > > commit overrun: 0
> > > nmi dropped: 0
> > >
> > > Where entries are the total number of data entries in the buffer.
> > >
> > > overrun is the number of entries not consumed and were overwritten by
> > > the writer.
> > >
> > > commit overrun is the number of entries dropped due to nested writers
> > > wrapping the buffer before the initial writer finished the commit.
> >
> >
> > I feel a bit confused with this one.
> > How such a thing can happen? The write page and the commit page
> > are not the same. So is that because we can have (ring-buffer inspires
> > all of us to try ascii-art):
> >
> >
> > Write page Commit page (which becomes new write page)
> > ------------------------------------ -----------------
> > | | | | |
> > Writer 1 | Writer 2 | Writer n | | Writer n + 1 | .....
> > reserved | reserved | reserved | | reserved |
> > ----------------------------------- ----------------
> > | ^
> > | |
> > ---------------- Was supposed to commit here--|
> >
> >
> > I know this is silly, my picture seem to show a data copy whereas
> > the ring buffer deals with page pointers.
> > But the commit page on the ring buffer is a mistery for me.
> > Just because you haven't drawn in in ascii in your comments :)
> >
>
> I have a bunch of ascii art that explains all this in my documentation
> that details the lockless version.
>
> The commit page is the page that holds the last full commit.
>
> ring_buffer_unlock_commit()
>
> On ring_buffer_lock_reserve() we reserve some data after the last commit.
>
> commit
> |
> V
> +---+ +---+ +---+ +---+
> <---| |--->| |--->| |--->| |--->
> --->| |<---| |<---| |<---| |<---
> +---+ +---+ +---+ +---+
> ^
> |
> tail (writer)
>
> We do not disable interrupts (or softirqs) between
> ring_buffer_lock_reserve and ring_buffer_unlock_commit. If we get
> preempted by an interrupt or softirq, and it writes to the ring buffer, it
> will move the tail, but not the commit. Only the outer most writer (non
> nested) can do that:
>
> commit
> |
> V
> +---+ +---+ +---+ +---+
> <---| |--->| |--->| |--->| |--->
> --->| |<---| |<---| |<---| |<---
> +---+ +---+ +---+ +---+
> ^
> |
> tail (writer)

Aah, now I understand what does rb_set_commit_to_write()

>
> But lets say we are running the function graph tracer along with the event
> tracer. And to save space, we shrunk the ring buffer down to a few pages.
> (more than 2) We are writing an event and get preempted by an interrupt
> followed by several softirqs, and these softirqs perform a lot of
> functions. It can push the tail all around the buffer:
>
> commit
> |
> V
> +---+ +---+ +---+ +---+
> <---| |--->| |--->| |--->| |--->
> --->| |<---| |<---| |<---| |<---
> +---+ +---+ +---+ +---+
> ^
> |
> tail (writer)
>
>
> This happens before we even finish the original write. But the tail can
> not push the commit forward, because when we fall out of this stack of
> writers, that original writer is in the process of writing into the ring
> buffer. Thus we need to drop any more entries that want to push the tail
> pointer onto the commit pointer.
>
> Thus, when this happens we record it with commit_overrun.
>
> -- Steve

Ok thanks for these explanations!

I hope your lockless ring buffer will be posted soon (any news about
the patent?) :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alan D. Brunelle: "Re: [PATCH] blktrace: swap arg name "from" and "to" of blk_add_trace_remap"
Previous message: Alan D. Brunelle: "Re: Subject: [PATCH] from-sector redundant in trace_block_remap"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]