Re: [RFC][PATCH 12/12] perf: Collapse and fix event_function_call() users

From: Peter Zijlstra
Date: Wed Jan 13 2016 - 15:45:00 EST


On Wed, Jan 13, 2016 at 07:10:16PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 13, 2016 at 05:00:50PM +0200, Alexander Shishkin wrote:
> > I think I caught one, below.
> >
> > Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
> >
> > > +static int event_function(void *info)
> > > +{
> > > + struct event_function_struct *efs = info;
> > > + struct perf_event *event = efs->event;
> > > + struct perf_event_context *ctx = event->ctx;
> > > + struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
> > > + struct perf_event_context *task_ctx = cpuctx->task_ctx;
> > > +
> > > + WARN_ON_ONCE(!irqs_disabled());
> > > +
> > > + /*
> > > + * Since we do the IPI call without holding ctx->lock things can have
> > > + * changed, double check we hit the task we set out to hit.
> > > + *
> > > + * If ctx->task == current, we know things must remain valid because
> > > + * we have IRQs disabled so we cannot schedule.
> > > + */
> > > + if (ctx->task) {
> > > + if (ctx->task != current)
> > > + return -EAGAIN;
> > > +
> > > + WARN_ON_ONCE(task_ctx != ctx);
> >
> > Looks like between dropping ctx::lock in event_function_call() and here,
> > cpuctx::task_ctx may still become NULL.
>
> Hmm yes I think you're right. If the event is being migrated to another
> context concurrently the remove_from_context() might have gone through,
> we'll still be waiting for sync_rcu and then this (say enabled) happens
> and we're looking at a 'dead' context.

Hmm, no. This cannot be, all event_function_call() users should hold
ctx->mutex.