Re: [PATCH] ring-buffer: Fix buffer max_data_size with max_event_size

From: Steven Rostedt
Date: Tue Dec 12 2023 - 06:43:58 EST


On Mon, 11 Dec 2023 20:40:33 +0900
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> wrote:

> On Sat, 9 Dec 2023 17:09:25 -0500
> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> > On Sat, 9 Dec 2023 17:01:39 -0500
> > Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >
> > > From: "Steven Rostedt (Google)" <rostedt@xxxxxxxxxxx>
> > >
> > > The maximum ring buffer data size is the maximum size of data that can be
> > > recorded on the ring buffer. Events must be smaller than the sub buffer
> > > data size minus any meta data. This size is checked before trying to
> > > allocate from the ring buffer because the allocation assumes that the size
> > > will fit on the sub buffer.
> > >
> > > The maximum size was calculated as the size of a sub buffer page (which is
> > > currently PAGE_SIZE minus the sub buffer header) minus the size of the
> > > meta data of an individual event. But it missed the possible adding of a
> > > time stamp for events that are added long enough apart that the event meta
> > > data can't hold the time delta.
> > >
> > > When an event is added that is greater than the current BUF_MAX_DATA_SIZE
> > > minus the size of a time stamp, but still less than or equal to
> > > BUF_MAX_DATA_SIZE, the ring buffer would go into an infinite loop, looking
> > > for a page that can hold the event. Luckily, there's a check for this loop
> > > and after 1000 iterations and a warning is emitted and the ring buffer is
> > > disabled. But this should never happen.
> > >
> > > This can happen when a large event is added first, or after a long period
> > > where an absolute timestamp is prefixed to the event, increasing its size
> > > by 8 bytes. This passes the check and then goes into the algorithm that
> > > causes the infinite loop.
> > >
> > > Fix this by creating a BUF_MAX_EVENT_SIZE to be used to determine if the
> > > passed in event is too big for the buffer.
> > >
> >
> > Forgot to add:
> >
> > Cc: stable@xxxxxxxxxxxxxxx
> > Fixes: a4543a2fa9ef3 ("ring-buffer: Get timestamp after event is allocated")
>
> Looks good to me.
>
> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>

Actually, I found out that this can be fixed with:

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 86a60a0eb279..aaad104a1707 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3594,7 +3594,7 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
info->length += RB_LEN_TIME_EXTEND;
} else {
info->delta = info->ts - info->after;
- if (unlikely(test_time_stamp(info->delta))) {
+ if (w && unlikely(test_time_stamp(info->delta))) {
info->add_timestamp |= RB_ADD_STAMP_EXTEND;
info->length += RB_LEN_TIME_EXTEND;
}

The bug that this patch fixed was that the size that was acceptable did not take
into account the added time stamp. But the time stamp should *not* be added
if it's the first event on the sub buffer. And once it goes to the next
buffer, it should be the first event.

I ran the same tests with this change, and it works just as well. I believe
this is the proper fix.

I'll send a v2.

Thanks!

-- Steve


>
> Thanks,
> >
> > -- Steve
> >
> >
> > > Reported-by: Kent Overstreet <kent.overstreet@xxxxxxxxx> # (on IRC)
> > > Signed-off-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
> > > ---
> > > kernel/trace/ring_buffer.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > > index 8d2a4f00eca9..a38e5a3c6803 100644
> > > --- a/kernel/trace/ring_buffer.c
> > > +++ b/kernel/trace/ring_buffer.c
> > > @@ -378,6 +378,9 @@ static inline bool test_time_stamp(u64 delta)
> > > /* Max payload is BUF_PAGE_SIZE - header (8bytes) */
> > > #define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2))
> > >
> > > +/* Events may have a time stamp attached to them */
> > > +#define BUF_MAX_EVENT_SIZE (BUF_MAX_DATA_SIZE - RB_LEN_TIME_EXTEND)
> > > +
> > > int ring_buffer_print_page_header(struct trace_seq *s)
> > > {
> > > struct buffer_data_page field;
> > > @@ -3810,7 +3813,7 @@ ring_buffer_lock_reserve(struct trace_buffer *buffer, unsigned long length)
> > > if (unlikely(atomic_read(&cpu_buffer->record_disabled)))
> > > goto out;
> > >
> > > - if (unlikely(length > BUF_MAX_DATA_SIZE))
> > > + if (unlikely(length > BUF_MAX_EVENT_SIZE))
> > > goto out;
> > >
> > > if (unlikely(trace_recursive_lock(cpu_buffer)))
> > > @@ -3960,7 +3963,7 @@ int ring_buffer_write(struct trace_buffer *buffer,
> > > if (atomic_read(&cpu_buffer->record_disabled))
> > > goto out;
> > >
> > > - if (length > BUF_MAX_DATA_SIZE)
> > > + if (length > BUF_MAX_EVENT_SIZE)
> > > goto out;
> > >
> > > if (unlikely(trace_recursive_lock(cpu_buffer)))
> >
>
>