Re: [PATCH] tracing: Fix race when concurrently splice_read trace_pipe

From: Zheng Yejian
Date: Sat Aug 12 2023 - 03:38:21 EST


On 2023/8/12 03:25, Steven Rostedt wrote:
On Thu, 10 Aug 2023 20:39:05 +0800
Zheng Yejian <zhengyejian1@xxxxxxxxxx> wrote:

When concurrently splice_read file trace_pipe and per_cpu/cpu*/trace_pipe,
there are more data being read out than expected.

Honestly the real fix is to prevent that use case. We should probably have
access to trace_pipe lock all the per_cpu trace_pipes too.

-- Steve


Hi~

Reproduction testcase is show as below, it can always reproduce the
issue in v5.10, and after this patch, the testcase passed.

In v5.10, when run `cat trace_pipe > /tmp/myfile &`, it call
sendfile() to transmit data from trace_pipe into /tmp/myfile. And in
kernel, .splice_read() of trace_pipe is called then the issue is
reproduced.

However in the newest v6.5, this reproduction case didn't run into the
.splice_read() of trace_pipe, because after commit 97ef77c52b78 ("fs:
check FMODE_LSEEK to control internal pipe splicing"), non-seekable
trace_pipe cannot be sendfile-ed.

``` repro.sh
#!/bin/bash


do_test()
{
local trace_dir=/sys/kernel/tracing
local trace=${trace_dir}/trace
local old_trace_lines
local new_trace_lines
local tempfiles
local testlog="trace pipe concurrency issue"
local pipe_pids
local i
local write_cnt=1000
local read_cnt=0
local nr_cpu=`nproc`

# 1. At first, clear all ring buffer
echo > ${trace}

# 2. Count how many lines in trace file now
old_trace_lines=`cat ${trace} | wc -l`

# 3. Close water mark so that reader can read as event comes
echo 0 > ${trace_dir}/buffer_percent

# 4. Read percpu trace_pipes into local file on background.
# Splice read must be used under command 'cat' so that the racy
# issue can be reproduced !!!
i=0
while [ ${i} -lt ${nr_cpu} ]; do
tempfiles[${i}]=/tmp/percpu_trace_pipe_${i}
cat ${trace_dir}/per_cpu/cpu${i}/trace_pipe > ${tempfiles[${i}]} &
pipe_pids[${i}]=$!
let i=i+1
done

# 5. Read main trace_pipe into local file on background.
# The same, splice read must be used to reproduce the issue !!!
tempfiles[${i}]=/tmp/main_trace_pipe
cat ${trace_dir}/trace_pipe > ${tempfiles[${i}]} &
pipe_pids[${i}]=$!

echo "Take a break, let readers run."
sleep 3

# 6. Write events into ring buffer through trace_marker, so that
# hungry readers start racing these events.
i=0
while [ ${i} -lt ${write_cnt} ]; do
echo "${testlog} <${i}>" > ${trace_dir}/trace_marker
let i=i+1
done

# 7. Wait until all events being consumed
new_trace_lines=`cat ${trace} | wc -l`
while [ "${new_trace_lines}" != "${old_trace_lines}" ]; do
new_trace_lines=`cat ${trace} | wc -l`
sleep 1
done
echo "All written events have been consumed."

# 8. Kill all readers and count the events readed
i=0
while [ ${i} -lt ${#pipe_pids[*]} ]; do
local num

kill -9 ${pipe_pids[${i}]}
wait ${pipe_pids[${i}]}
num=`cat ${tempfiles[${i}]} | grep "${testlog}" | wc -l`
let read_cnt=read_cnt+num
let i=i+1
done

# 9. Expect to read events as much as write
if [ "${read_cnt}" != "${write_cnt}" ]; then
echo "Test fail: write ${write_cnt} but read ${read_cnt} !!!"
return 1
fi

# 10. Clean temp files if test success
i=0
while [ ${i} -lt ${#tempfiles[*]} ]; do
rm ${tempfiles[${i}]}
let i=i+1
done
return 0
}

do_test
```

-- Zheng Yejian