Re: perf: wrong event->count report (Was: perf basic-test-aarch64 failures)

From: Peter Zijlstra
Date: Wed Feb 17 2016 - 14:43:21 EST


On Wed, Feb 17, 2016 at 08:34:42PM +0100, Oleg Nesterov wrote:
> On 02/17, Peter Zijlstra wrote:
> >
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -3173,6 +3173,10 @@ static void perf_event_enable_on_exec(in
> >
> > cpuctx = __get_cpu_context(ctx);
> > perf_ctx_lock(cpuctx, ctx);
> > +
> > + update_context_time(ctx);
> > + update_cgrp_time_from_cpuctx(cpuctx);
> > +
>
> Even if I don't really understand this change I agree, probably we need to update
> the counters for enable_on_exec events somehow. But I don't see how this change can
> make total_time_running == total_time_enabled.

Yes, to get them exactly equal more is needed. But I suspect the very
small difference generated here is practically irrelevant.

Making them equal requires some very careful auditing, but is otherwise
entirely possible.

The bigger problem seems to be that this (seemingly) simple change makes
Jiri's machine explode, I yet have to look at reproducing that.