Re: [PATCH] perf, x86: catch spurious interrupts after disablingcounters

From: Peter Zijlstra
Date: Fri Sep 17 2010 - 05:15:48 EST


On Fri, 2010-09-17 at 10:51 +0200, Robert Richter wrote:
> On 16.09.10 13:34:40, Peter Zijlstra wrote:
> > On Wed, 2010-09-15 at 18:20 +0200, Robert Richter wrote:
> > > Some cpus still deliver spurious interrupts after disabling a counter.
> > > This caused 'undelivered NMI' messages. This patch fixes this.
> > >
> > I tried the below and that also seems to work.. So yeah, looks like
> > we're getting late NMIs.
>
> I would rather prefer the fix I sent. This patch does a rdmsrl() with
> each nmi on every inactive counter.

Sure, I was just playing around trying to see if that was indeed the
problem.

> It also changes the counter value
> of all inactive counters, thus restarting a counter by only setting
> the enable bit may start with an unexpected counter value (didn't look
> at current implementation if this could be a problem).

It actually would, pmu->stop()/->start() won't save/restore the counter
value unless you add PERF_EF_UPDATE/PERF_EF_RELOAD.

> It is also not possible to detect with hardware, which counter fired
> the interrupt. We cannot assume a counter overflowed by just reading
> the upper bit of the counter value. We must track this in software.

Well, exactly that seemed sufficient to not get spurious NMIs.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/