Re: [PATCH v8 0/2] Introducing trace buffer mapping by user-space

From: Steven Rostedt
Date: Wed Dec 20 2023 - 08:28:49 EST


On Wed, 20 Dec 2023 13:06:06 +0000
Vincent Donnefort <vdonnefort@xxxxxxxxxx> wrote:

> > @@ -771,10 +772,20 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
> > static void rb_wake_up_waiters(struct irq_work *work)
> > {
> > struct rb_irq_work *rbwork = container_of(work, struct rb_irq_work, work);
> > - struct ring_buffer_per_cpu *cpu_buffer =
> > - container_of(rbwork, struct ring_buffer_per_cpu, irq_work);
> > + struct ring_buffer_per_cpu *cpu_buffer;
> > + struct trace_buffer *buffer;
> > + int cpu;
> >
> > - rb_update_meta_page(cpu_buffer);
> > + if (rbwork->is_cpu_buffer) {
> > + cpu_buffer = container_of(rbwork, struct ring_buffer_per_cpu, irq_work);
> > + rb_update_meta_page(cpu_buffer);
> > + } else {
> > + buffer = container_of(rbwork, struct trace_buffer, irq_work);
> > + for_each_buffer_cpu(buffer, cpu) {
> > + cpu_buffer = buffer->buffers[cpu];
> > + rb_update_meta_page(cpu_buffer);
> > + }
> > + }
>
> Arg, somehow never reproduced the problem :-\. I suppose you need to cat
> trace/trace_pipe and mmap(trace/cpuX/trace_pipe) at the same time?

It triggered as soon as I ran "trace-cmd start -e sched_switch"

In other words, it broke the non mmap case. This function gets called for
both the buffer and cpu_buffer irq_work entries. You added the
container_of() to get access to cpu_buffer, when the rbwork could also be
for the main buffer too. The main buffer has no meta page, and it triggered
a NULL pointer dereference, as "cpu_buffer->mapped" returned true (because
it was on something of the buffer structure that wasn't zero), and then here:

if (cpu_buffer->mapped) {
WRITE_ONCE(cpu_buffer->meta_page->reader.read, 0);

It dereferenced cpu_buffer->meta_page->reader

which is only God knows what!

>
> Updating the meta-page is only useful if the reader we are waking up is a
> user-space one, which would only happen with the cpu_buffer version of this
> function. We could limit the update of the meta_page only to this case?

I rather not add another irq_work entry. This workaround should be good
enough.

Thanks,

-- Steve