races when reserving an event in the unified trace buffer

From: Jiaying Zhang
Date: Fri Dec 12 2008 - 19:27:26 EST


Hi Steve,

I am doing some load testing with our kernel tracing prototype
that uses the unified trace buffer for managing its data. I sometimes
saw kernel stack dump caused by the following checking in
function __rb_reserve_next:
if (unlikely(next_page == cpu_buffer->commit_page)) {
WARN_ON_ONCE(1);
goto out_unlock;
}
The comments above the code say the problem is caused by
"an interrupt storm that made it all the way around the buffer".
But I think there is race here that a single interrupt can cause
the check to fail. Suppose this is what happens:
An event is traced and calls __rb_reserve_next. Right after it
gets the current tail_page (line tail_page = cpu_buffer->tail_page;),
an interrupt happens that is also traced. The interrupt also takes
the same tail_page. The interrupt event moves the tail_page
forward if the tail_page is full. Note that the interrupt event gets
the old 'write' value because the first event has not updated that yet.
So the interrupt event may also update the commit_page if it is
the same as the tail_page. As a result, the above check would
fail after the interrupt finishes and the first event resumes its execution.

I have seen the problem happens frequently under heavy loads
on a multi-core machine. Interestingly, I also saw the above
warning that might actually be caused by an interrupt storm.
I was using 64k buffer size and am not sure whether it is possible
for so many interrupts to happen in a short time window.

I think we can use the time_stamp to distinguish the two cases.
Also, in either case, it seems bad to leave the tail_page->write with
an invalid value because it can cause problem when a reader
reads the page. Here is my proposed fix for the problem:

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7f69cfe..1500f78 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -982,8 +982,11 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
* it all the way around the buffer, bail, and warn
* about it.
*/
- if (unlikely(next_page == cpu_buffer->commit_page)) {
+ if (unlikely(next_page == cpu_buffer->commit_page) &&
+ tail_page->time_stamp > next_page->time_stamp) {
WARN_ON_ONCE(1);
+ if (tail <= BUF_PAGE_SIZE)
+ local_set(&tail_page->write, tail);
goto out_unlock;
}


Jiaying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/