Re: [PATCH] ring-buffer: Fix wake ups when buffer_percent is set to 100

From: Steven Rostedt
Date: Thu Dec 28 2023 - 23:05:31 EST


On Wed, 27 Dec 2023 07:57:08 +0900
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> wrote:

> On Tue, 26 Dec 2023 12:59:02 -0500
> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> > From: "Steven Rostedt (Google)" <rostedt@xxxxxxxxxxx>
> >
> > The tracefs file "buffer_percent" is to allow user space to set a
> > water-mark on how much of the tracing ring buffer needs to be filled in
> > order to wake up a blocked reader.
> >
> > 0 - is to wait until any data is in the buffer
> > 1 - is to wait for 1% of the sub buffers to be filled
> > 50 - would be half of the sub buffers are filled with data
> > 100 - is not to wake the waiter until the ring buffer is completely full
> >
> > Unfortunately the test for being full was:
> >
> > dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
> > return (dirty * 100) > (full * nr_pages);
> >
> > Where "full" is the value for "buffer_percent".
> >
> > There is two issues with the above when full == 100.
> >
> > 1. dirty * 100 > 100 * nr_pages will never be true
> > That is, the above is basically saying that if the user sets
> > buffer_percent to 100, more pages need to be dirty than exist in the
> > ring buffer!
> >
> > 2. The page that the writer is on is never considered dirty, as dirty
> > pages are only those that are full. When the writer goes to a new
> > sub-buffer, it clears the contents of that sub-buffer.
> >
> > That is, even if the check was ">=" it would still not be equal as the
> > most pages that can be considered "dirty" is nr_pages - 1.
> >
> > To fix this, add one to dirty and use ">=" in the compare.
> >
> > Cc: stable@xxxxxxxxxxxxxxx
> > Fixes: 03329f9939781 ("tracing: Add tracefs file buffer_percentage")
> > Signed-off-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
> > ---
> > kernel/trace/ring_buffer.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index 83eab547f1d1..32c0dd2fd1c3 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -881,9 +881,14 @@ static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int f
> > if (!nr_pages || !full)
> > return true;
> >
> > - dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
> > + /*
> > + * Add one as dirty will never equal nr_pages, as the sub-buffer
> > + * that the writer is on is not counted as dirty.
> > + * This is needed if "buffer_percent" is set to 100.
> > + */
> > + dirty = ring_buffer_nr_dirty_pages(buffer, cpu) + 1;
>
> Is this "+ 1" required? If we have 200 pages and 1 buffer is dirty,
> it is 0.5% dirty. Consider @full = 1%.

Yes it is required, as the comment above it states. dirty will never
equal nr_pages. Without it, buffer_percent == 100 will never wake up.

The +1 is to add the page the writer is on, which is never considered
"dirty".

>
> @dirty = 1 + 1 = 2 and @dirty * 100 == 200. but
> @full * @nr_pages = 1 * 200 = 200.
> Thus it hits (200 >= 200 is true) even if dirty pages are 0.5%.

Do we care?

What's the difference if it wakes up on 2 dirty pages or 1? It would be
very hard to measure the difference.

But if you say 100, which means "I want to wake up when full" it will
never wake up. Because it will always be nr_pages - 1.

We could also say the +1 is the reader page too, because that's not
counted as well.

In other words, we can bike shed this to make 1% accurate (which
honestly, I have no idea what the use case for that would be) or we can
fix the bug that has 100% which just means, wake me up if the buffer is
full, and when the writer is on the last page, it is considered full.

-- Steve