Re: [RFC PATCH V2 2/7] perf: Init/fini PMU specific data

From: Liang, Kan
Date: Mon Jan 06 2020 - 11:18:28 EST




On 1/6/2020 9:31 AM, Peter Zijlstra wrote:
On Mon, Jan 06, 2020 at 06:23:43AM -0800, Andi Kleen wrote:
+ rcu_read_lock();
+ for_each_process_thread(g, p) {
+ mutex_lock(&p->perf_event_mutex);
+ if (p->perf_ctx_data) {
+ /*
+ * The perf_ctx_data for this thread may has been
+ * allocated by per-task event.
+ * Only update refcount for the case.
+ */
+ refcount_inc(&p->perf_ctx_data->refcount);
+ mutex_unlock(&p->perf_event_mutex);
+ continue;
+ }
+
+ if (pos < num_thread) {
+ refcount_set(&data[pos]->refcount, 1);
+ rcu_assign_pointer(p->perf_ctx_data, data[pos++]);
+ } else {
+ /*
+ * There may be some new threads created,
+ * when we allocate space.
+ * Track the number in nr_new_tasks.
+ */
+ nr_new_tasks++;
+ }
+ mutex_unlock(&p->perf_event_mutex);
+ }
+ rcu_read_unlock();
+
+ raw_spin_unlock_irqrestore(&task_data_sys_wide_events_lock, flags);

Still NAK. That's some mightly broken code there.

Yes, Kan you cannot use a mutex (sleeping) inside rcu_read_lock().
Can perf_event_mutex be a spin lock?

Or insize that raw_spin_lock.

The task_data_sys_wide_events_lock is a global lock. I think we just need per-task lock here.

I think I will add a new dedicate per-task raw_spin_lock for perf_ctx_data here.


And last time I expressly said to not do
what whole tasklist iteration under a spinlock.


We need an indicator to tell if the assignment for all existing tasks has been finished. Because we have to wait before processing the cases as below.
- Allocate the space for new threads. If we don't wait the assignments finished, we cannot tell if the perf_ctx_data is allocated by previous per-task event, or this system-wide event. The refcount may double count.
- There may be two or more system-wide events at the same time. When we are allocating the space for the first event, the second one may start profiling if we don't wait. The LBR shorten issue still exists.
- We have to serialize assignment and free.

If we cannot use spinlock to serialize the cases here, can we set a state when the assignment is finished, and use wait_var_event() in the cases as above?


Thanks,
Kan