Re: [PATCH v11 2/5] ring-buffer: Introducing ring-buffer mapping functions

From: Vincent Donnefort
Date: Fri Jan 12 2024 - 04:13:27 EST


On Thu, Jan 11, 2024 at 06:23:20PM -0500, Steven Rostedt wrote:
> On Thu, 11 Jan 2024 11:34:58 -0500
> Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
>
> > The LTTng kernel tracer has supported mmap'd buffers for nearly 15 years [1],
> > and has a lot of similarities with this patch series.
> >
> > LTTng has the notion of "subbuffer id" to allow atomically exchanging a
> > "reader" extra subbuffer with the subbuffer to be read. It implements
> > "get subbuffer" / "put subbuffer" ioctls to allow the consumer (reader)
> > to move the currently read subbuffer position. [2]
> >
> > It would not hurt to compare your approach to LTTng and highlight
> > similarities/differences, and the rationale for the differences.
> >
> > Especially when it comes to designing kernel ABIs, it's good to make sure
> > that all bases are covered, because those choices will have lasting impacts.
> >
> > Thanks,
> >
> > Mathieu
> >
> > [1] https://git.lttng.org/?p=lttng-modules.git;a=blob;f=src/lib/ringbuffer/ring_buffer_mmap.c
> > [2] https://git.lttng.org/?p=lttng-modules.git;a=blob;f=src/lib/ringbuffer/ring_buffer_vfs.c
> >
>
> Hi Mathieu,
>
> Thanks for sharing!
>
> As we discussed a little bit in the tracing meeting we do somethings
> differently but very similar too ;-)
>
> The similarities as that all the sub-buffers are mapped. You have a
> reader-sub-buffer as well.
>
> The main difference is that you use an ioctl() that returns where to find
> the reader-sub-buffer, where our ioctl() is just "I'm done, get me a new
> reader-sub-buffer". Ours will update the meta page.
>
> Our meta page looks like this:
>
> > +struct trace_buffer_meta {
> > + unsigned long entries;
> > + unsigned long overrun;
> > + unsigned long read;
>
> If start tracing: trace-cmd start -e sched_switch and do:
>
> ~ cat /sys/kernel/tracing/per_cpu/cpu0/stats
> entries: 14
> overrun: 0
> commit overrun: 0
> bytes: 992
> oldest event ts: 84844.825372
> now ts: 84847.102075
> dropped events: 0
> read events: 0
>
> You'll see similar to the above.
>
> entries = entries
> overrun = overrun
> read = read
>
> The above "read" is total number of events read.
>
> Pretty staight forward ;-)
>
>
> > +
> > + unsigned long subbufs_touched;
> > + unsigned long subbufs_lost;
> > + unsigned long subbufs_read;
>
> Now I'm thinking we may not want this exported, as I'm not sure it's useful.

touched and lost are not useful now, but it'll be for my support of the
hypervisor tracing, that's why I added them already.

subbufs_read could probably go away though as even in that case I can track that
in the reader.

>
> Vincent, talking with Mathieu, he was suggesting that we only export what
> we really need, and I don't think we need the above. Especially since they
> are not exposed in the stats file.
>
>
> > +
> > + struct {
> > + unsigned long lost_events;
> > + __u32 id;
> > + __u32 read;
> > + } reader;
>
> The above is definitely needed, as all of it is used to read the
> reader-page of the sub-buffer.
>
> lost_events is the number of lost events that happened before this
> sub-buffer was swapped out.
>
> Hmm, Vincent, I just notice that you update the lost_events as:
>
> > +static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
> > +{
> > + struct trace_buffer_meta *meta = cpu_buffer->meta_page;
> > +
> > + WRITE_ONCE(meta->entries, local_read(&cpu_buffer->entries));
> > + WRITE_ONCE(meta->overrun, local_read(&cpu_buffer->overrun));
> > + WRITE_ONCE(meta->read, cpu_buffer->read);
> > +
> > + WRITE_ONCE(meta->subbufs_touched, local_read(&cpu_buffer->pages_touched));
> > + WRITE_ONCE(meta->subbufs_lost, local_read(&cpu_buffer->pages_lost));
> > + WRITE_ONCE(meta->subbufs_read, local_read(&cpu_buffer->pages_read));
> > +
> > + WRITE_ONCE(meta->reader.read, cpu_buffer->reader_page->read);
> > + WRITE_ONCE(meta->reader.id, cpu_buffer->reader_page->id);
> > + WRITE_ONCE(meta->reader.lost_events, cpu_buffer->lost_events);
> > +}
>
> The lost_events may need to be handled differently, as it doesn't always
> get cleared. So it may be stale data.

My idea was to check this value after the ioctl(). If > 0 then events were lost
between the that ioctl() and the previous swap.

But now if with "[PATCH] ring-buffer: Have mmapped ring buffer keep track of
missed events" we put this information in the page itself, we can get rid of
this field.

>
>
> > +
> > + __u32 subbuf_size;
> > + __u32 nr_subbufs;
>
> This gets is the information needed to read the mapped ring buffer.
>
> > +
> > + __u32 meta_page_size;
> > + __u32 meta_struct_len;
>
> The ring buffer gets mapped after "meta_page_size" and this structure is
> "meta_struct_len" which will change if we add new data to it.
>
> > +};
>
> -- Steve