Re: [PATCH v2 2/5] perf: Free aux pages in unmap path

From: Alexander Shishkin
Date: Mon Mar 14 2016 - 10:04:56 EST


Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:

> On Fri, Mar 04, 2016 at 03:42:46PM +0200, Alexander Shishkin wrote:
>> @@ -4649,10 +4679,22 @@ static void perf_mmap_close(struct vm_area_struct *vma)
>> */
>> if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
>> atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
>> + /*
>> + * Stop all aux events that are writing to this here buffer,
>> + * so that we can free its aux pages and corresponding pmu
>> + * data. Note that after rb::aux_mmap_count dropped to zero,
>> + * they won't start any more (see perf_aux_output_begin()).
>> + */
>> + perf_pmu_output_stop(event);
>
> So to me it seems like we're interested in rb, we don't particularly
> care about @event in this case.

Yeah, @event is used only for its rb and pmu down this path.

>> + if (!has_aux(event))
>> + return;
>> +
>
> if (!parent)
> parent = event;
>
>> + if (rcu_dereference(event->rb) == rb)
> s/event/parent/
>
>> + ro->err = __perf_event_stop(event);
>
>> + else if (parent && rcu_dereference(parent->rb) == rb)
>> + ro->err = __perf_event_stop(event);
>
> and these can go.. However..

You're right: it's the parent that's got the ->rb, but it may well be
the child that's actually running and writing data there. So we need to
stop any running children. I think I actually broke this bit since the
previous version, because there needs to be a comment about it.

>
>> +}
>> +
>> +static int __perf_pmu_output_stop(void *info)
>> +{
>> + struct perf_event *event = info;
>> + struct pmu *pmu = event->pmu;
>> + struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
>> + struct remote_output ro = {
>> + .rb = event->rb,
>> + };
>> +
>> + rcu_read_lock();
>> + perf_event_aux_ctx(&cpuctx->ctx, __perf_event_output_stop, &ro);
>> + if (cpuctx->task_ctx)
>> + perf_event_aux_ctx(cpuctx->task_ctx, __perf_event_output_stop,
>> + &ro);
>> + rcu_read_unlock();
>> +
>> + return ro.err;
>> +}
>> +
>> +static void perf_pmu_output_stop(struct perf_event *event)
>> +{
>> + int cpu, err;
>> +
>> + /* better be thorough */
>> + get_online_cpus();
>> +restart:
>> + for_each_online_cpu(cpu) {
>> + err = cpu_function_call(cpu, __perf_pmu_output_stop, event);
>> + if (err)
>> + goto restart;
>> + }
>> + put_online_cpus();
>> +}
>
> This seems wildly overkill, could we not iterate rb->event_list like we
> do for the normal buffer?

Actually we can. One problem though is that iterating rb::event_list
requires rcu read section or irqsafe rb::event_lock and we need to send
IPIs. The normal buffer case tears down the rb::event_list as it goes,
so it can close the rcu read section right after it fetches one event
from it. In this case however, we must keep the list intact.

> Sure, we need to IPI for each event found, but that seems better than
> unconditionally sending IPIs to all CPUs.

Actually, won't it "often" be the case that the number of events will be
a multiple of the number of cpus? The usual use case being one event per
task per cpu and inheritance enabled. In this case we'll zap multiple
events per IPI.

Regards,
--
Alex