Re: [PATCH v2] perf_counter: Add counter enable/disable ioctls

From: Ingo Molnar
Date: Sun Jan 18 2009 - 12:13:54 EST



* Paul Mackerras <paulus@xxxxxxxxx> wrote:

> Impact: New perf_counter features
>
> This primarily adds a way for perf_counter users to enable and disable
> counters and groups. Enabling or disabling a counter or group also
> enables or disables all of the child counters that have been cloned
> from it to monitor children of the task monitored by the top-level
> counter. The userspace interface to enable/disable counters is via
> ioctl on the counter file descriptor.
>
> Along the way this extends the code that handles child counters to
> handle child counter groups properly. A group with multiple counters
> will be cloned to child tasks if and only if the group leader has the
> hw_event.inherit bit set - if it is set the whole group is cloned as a
> group in the child task.
>
> In order to be able to enable or disable all child counters of a given
> top-level counter, we need a way to find them all. Hence I have added
> a child_list field to struct perf_counter, which is the head of the
> list of children for a top-level counter, or the link in that list for
> a child counter. That list is protected by the perf_counter.mutex
> field.
>
> This also adds a mutex to the perf_counter_context struct. Previously
> the list of counters was protected just by the lock field in the
> context, which meant that perf_counter_init_task had to take that lock
> and then take whatever lock/mutex protects the top-level counter's
> child_list. But the counter enable/disable functions need to take
> that lock in order to traverse the list, then for each counter take
> the lock in that counter's context in order to change the counter's
> state safely, which will lead to a deadlock.
>
> To solve this, we now have both a mutex and a spinlock in the context,
> and taking either is sufficient to ensure the list of counters can't
> change - you have to take both before changing the list. Now
> perf_counter_init_task takes the mutex instead of the lock (which
> incidentally means that inherit_counter can use GFP_KERNEL instead of
> GFP_ATOMIC) and thus avoids the possible deadlock. Similarly the new
> enable/disable functions can take the mutex while traversing the list
> of child counters without incurring a possible deadlock when the
> counter manipulation code locks the context for a child counter.
>
> We also had an misfeature that the first counter added to a context
> would possibly not go on until the next sched-in, because we were
> using ctx->nr_active to detect if the context was running on a CPU.
> But nr_active is the number of active counters, and if that was zero
> (because the context didn't have any counters yet) it would look like
> the context wasn't running on a cpu and so the retry code in
> __perf_install_in_context wouldn't retry. So this adds an 'is_active'
> field that is set when the context is on a CPU, even if it has no
> counters. The is_active field is only used for task contexts, not for
> per-cpu contexts.
>
> If we enable a subsidiary counter in a group that is active on a CPU,
> and the arch code can't enable the counter, then we have to pull the
> whole group off the CPU. We do this with group_sched_out, which gets
> moved up in the file so it comes before all its callers. This also
> adds similar logic to __perf_install_in_context so that the "all on,
> or none" invariant of groups is preserved when adding a new counter to
> a group.
>
> Signed-off-by: Paul Mackerras <paulus@xxxxxxxxx>
> ---
> This version fixes a silly bug (err uninitialized in perf_ioctl) and
> makes the state of a cloned counter follow the state of the parent
> counter (not its hw_event.disable bit).
>
> This is in my perfcounters.git tree master branch at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/paulus/perfcounters.git master
>
> Ingo - please pull.

Pulled into tip/percounters/core, thanks Paul!

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/