Re: perf counter issue -WARN_ON_ONCE(!list_empty(&tsk->perf_counter_ctx.counter_list));

From: Paul Mackerras
Date: Mon May 18 2009 - 01:01:50 EST


Peter Zijlstra writes:

> OK, so the cleanup isn't solid.. I've been poking at things, and below
> is the current state of my tinkering, but it seems to make things
> worse...
>
> With only the callback in do_exit() the above test works but hackbench
> fails, with only the call in wait_task_zombie() hackbench works and the
> above fails.
>
> With both, we segfault the kernel on a list op on either :-)

I don't know if this is the problem, but I have noticed a basic
lifetime issue: a counter on a task points to a context which is
embedded in the task_struct of the task being counted, but the counter
might outlive the task. For example, task A puts a counter on task B,
task B dies and is reaped by its parent, but the counter still exists
because task A hasn't closed its fd. When task A does close the fd,
perf_release will call perf_counter_remove_from_context which will go
and use counter->ctx, but that is in B's task struct, which has gone
away.

I want to change the task struct to have just a pointer to the context
rather than the context struct itself for other reasons (it will make
it much easier to implement lazy PMU switching). If we do that we
could refcount the context and solve the lifetime issue that way. I'm
working on a patch; hopefully I'll have more to report later today.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/