Re: [PATCH] tracing: Fix to avoid wakeup loop in splice read of per-cpu buffer

From: Google
Date: Mon Aug 21 2023 - 19:51:08 EST


On Mon, 21 Aug 2023 11:27:03 -0400
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> On Tue, 22 Aug 2023 00:16:39 +0900
> Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> wrote:
>
> > BTW, even with this fix, blocking splice still causes a strange behavior.
> > If I set '400' to buffer_size_kb (so 100 pages) and '1' to buffer_percent,
> > splice always returns 8192 (2 pages) to read. But I expected that should
> > return 4096 (1 page). This means splice() waits longer than I thought.
> >
> > I think the fullfilled percentage calculation will be a bit wrong.
>
> Note, the percentage is when to wake up the task. If between the wakeup and
> the read/splice, another ring buffer page gets filled more, then it will
> give that as well. The buffer_percent is just how long to wait, not for how
> much to read.

Yes, but I used the trace_marker as Zheng did for testing, and I saw the
buffer is filled more than 4096 bytes via tracefs/per_cpu/cpu*/stats.

>
> Now if you test this with just adding enough to fill one page, and it
> doesn't wake up the reader, then that would be a bug.

Yes.

Run trace_agent with 1% buffer_percent == 1 page;

# echo 400 > /sys/kernel/tracing/buffer_size_kb
# echo 1 > /sys/kernel/tracing/buffer_percent
# trace-agent &

Ready to read in the host side,

$ for i in `seq 0 7`; do cat trace-path-cpu$i.out > trace-data.$i & done
$ echo 1 > agent-ctl-path.in

Write events on per-cpu buffer in the guest
# for i in `seq 1 1000`; do echo "test event data $i" | tee /sys/kernel/tracing/trace_marker > /dev/null; done

But trace_agent doesn't wake up. The data is still there.

# grep bytes /sys/kernel/tracing/per_cpu/cpu*/stats
/sys/kernel/tracing/per_cpu/cpu0/stats:bytes: 5936
/sys/kernel/tracing/per_cpu/cpu1/stats:bytes: 5584
/sys/kernel/tracing/per_cpu/cpu2/stats:bytes: 6368
/sys/kernel/tracing/per_cpu/cpu3/stats:bytes: 4704
/sys/kernel/tracing/per_cpu/cpu4/stats:bytes: 5972
/sys/kernel/tracing/per_cpu/cpu5/stats:bytes: 5620
/sys/kernel/tracing/per_cpu/cpu6/stats:bytes: 6588
/sys/kernel/tracing/per_cpu/cpu7/stats:bytes: 3496

And write more events via trace_marker again

# for i in `seq 1001 2000`; do echo "test event data $i" | tee /sys/kernel/tracing/trace_marker > /dev/null; done

The data is read;

# grep bytes /sys/kernel/tracing/per_cpu/cpu*/stats
/sys/kernel/tracing/per_cpu/cpu0/stats:bytes: 3220
/sys/kernel/tracing/per_cpu/cpu1/stats:bytes: 3960
/sys/kernel/tracing/per_cpu/cpu2/stats:bytes: 2420
/sys/kernel/tracing/per_cpu/cpu3/stats:bytes: 2024
/sys/kernel/tracing/per_cpu/cpu4/stats:bytes: 2912
/sys/kernel/tracing/per_cpu/cpu5/stats:bytes: 1064
/sys/kernel/tracing/per_cpu/cpu6/stats:bytes: 5004
/sys/kernel/tracing/per_cpu/cpu7/stats:bytes: 2684

I think this full_hit() function is somewhat wrong.

static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int full)
{
...
return (dirty * 100) > (full * nr_pages);
}

Ah, I also found a mistake on this patch too.

Thanks,

--
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>