Re: [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds

From: Peter Zijlstra
Date: Fri Aug 15 2014 - 10:31:53 EST


On Thu, Aug 14, 2014 at 07:47:56PM +0200, Stephane Eranian wrote:
> [+perf tool maintainers]
>
> On Thu, Aug 14, 2014 at 4:30 PM, Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
> >
> > I understand all your points, but there's no alternative.
> > The only other way would be to disable INST_RETIRED.ALL.
> >
> You cannot do that either. INST_RETIRED:ALL is important. I assume
> the bug applies whether or not the event is used with a filter.
>
> I think we need to ensure that by looking at the perf.data file, one
> can reconstruct the total number of inst_Retired:all occurrences for
> the run. With a fixed period, one would do num_samples * fixed_period.
> I know the Gooda tool does that. It is used to estimate the number of
> events captured vs. the number of events occurring.

OK, I think we can make that work; IFF we guarantee
perf_event_attr::sample_period >= 128.

Suppose we start out with sample_period=192; then we'll set period_left
to 192, we'll end up with left = 128 (we truncate the lower bits). We
get an interrupt, find that period_left = 64 (>0 so we return 0 and
don't get an overflow handler), up that to 128. Then we trigger again,
at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
an overflow). We increment with sample_period so we get left = 128. We
fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
overflow). And on and on.

So while the individual interrupts are 'wrong' we get then with
interval=256,128 in exactly the right ratio to average out at 192. And
this works for everything >=128.

So the num_samples*fixed_period thing is still entirely correct +- 127,
which is good enough I'd say, as you already have that error anyhow.

So no need to 'fix' the tools, al we need to do is refuse to create
INST_RETIRED:ALL events with sample_period < 128.

Attachment: pgp51CHGdQiPD.pgp
Description: PGP signature