Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

From: Alexey Budankov
Date: Wed Jun 21 2017 - 13:02:21 EST



Hi,

On 15.06.2017 20:42, Alexey Budankov wrote:
On 29.05.2017 14:45, Alexey Budankov wrote:
On 29.05.2017 14:23, Peter Zijlstra wrote:
On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote:
On 29.05.2017 13:43, Peter Zijlstra wrote:

Why can't the tree do both?


Well, indeed, the tree provides such capability too. However switching to
the full tree iteration in cases where we now go through _groups lists will
enlarge the patch, what is probably is not a big deal. Do you think it is
worth implementing the switch?

Do it as a series of patches, where patch 1 introduces the tree, patches
2 through n convert the list users into tree users, and patch n+1
removes the list.

Well ok, let's do that additionally but please expect delay in delivery (I am OOO till Jun 14).

addressed in v3.



I think its good to not have duplicate data structures if we can avoid
it.


yeah, makes sense.






After straightforward switch from struct list_head to struct rb_tree for flexible_groups I now get dmesg dumps on rb tree corruptions. That happens when iterating thru tree instead of thru list. No additional
synchronization for the tree access was added. It looks like there are
some assumptions on the list_head type in the implementation itself.

Are there any ideas on why that corruptions may happen?

I still suggest isolating event groups into a separate object (please see patch v4-1/4):

struct perf_event_groups {
struct rb_root tree;
struct list_head list;
};

struct perf_event_context {
...
struct perf_event_groups pinned_groups;
struct perf_event_groups flexible_groups;

and implementing new API for the object:

perf_event_groups_empty()
perf_event_groups_init()
perf_event_groups_insert()
perf_event_groups_delete()
perf_event_groups_rotate(..., int cpu)
perf_event_groups_iterate_cpu(..., int cpu)
perf_event_groups_iterate()

so that perf_event_groups_iterate() would go thru list but leaving
the opportunity of iteration thru tree for a separate patch because
complete transition to rb trees may incur synchronization overhead in runtime.

Thanks,
Alexey