Re: [PATCH] perf, x86: catch spurious interrupts after disabling counters

From: Stephane Eranian
Date: Wed Sep 15 2010 - 12:36:40 EST


On Wed, Sep 15, 2010 at 6:20 PM, Robert Richter <robert.richter@xxxxxxx> wrote:
> On 14.09.10 19:41:32, Robert Richter wrote:
>> I found the reason why we get the unknown nmi. For some reason
>> cpuc->active_mask in x86_pmu_handle_irq() is zero. Thus, no counters
>> are handled when we get an nmi. It seems there is somewhere a race
>> accessing the active_mask. So far I don't have a fix available.
>> Changing x86_pmu_stop() did not help:
>
> The patch below for tip/perf/urgent fixes this.
>
> -Robert
>
> From 4206a086f5b37efc1b4d94f1d90b55802b299ca0 Mon Sep 17 00:00:00 2001
> From: Robert Richter <robert.richter@xxxxxxx>
> Date: Wed, 15 Sep 2010 16:12:59 +0200
> Subject: [PATCH] perf, x86: catch spurious interrupts after disabling counters
>
> Some cpus still deliver spurious interrupts after disabling a counter.

Most likely the interrupt was in flight at the time you disabled it.
Does the counter value reflect this?
Were you also getting this if you were only measuring at the user level?

> This caused 'undelivered NMI' messages. This patch fixes this.
>
> Signed-off-by: Robert Richter <robert.richter@xxxxxxx>
> ---
> Âarch/x86/kernel/cpu/perf_event.c | Â 13 ++++++++++++-
> Â1 files changed, 12 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 3efdf28..df7aabd 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -102,6 +102,7 @@ struct cpu_hw_events {
> Â Â Â Â */
>    Âstruct perf_event    *events[X86_PMC_IDX_MAX]; /* in counter order */
>    Âunsigned long      active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
> +    unsigned long      running[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>    Âint           enabled;
>
>    Âint           n_events;
> @@ -1010,6 +1011,7 @@ static int x86_pmu_start(struct perf_event *event)
> Â Â Â Âx86_perf_event_set_period(event);
> Â Â Â Âcpuc->events[idx] = event;
> Â Â Â Â__set_bit(idx, cpuc->active_mask);
> + Â Â Â __set_bit(idx, cpuc->running);
> Â Â Â Âx86_pmu.enable(event);
> Â Â Â Âperf_event_update_userpage(event);
>
> @@ -1141,8 +1143,17 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
> Â Â Â Âcpuc = &__get_cpu_var(cpu_hw_events);
>
> Â Â Â Âfor (idx = 0; idx < x86_pmu.num_counters; idx++) {
> - Â Â Â Â Â Â Â if (!test_bit(idx, cpuc->active_mask))
> + Â Â Â Â Â Â Â if (!test_bit(idx, cpuc->active_mask)) {
> + Â Â Â Â Â Â Â Â Â Â Â if (__test_and_clear_bit(idx, cpuc->running))
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /*
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â* Though we deactivated the counter
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â* some cpus might still deliver
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â* spurious interrupts. Catching them
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â* here.
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â*/
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â handled++;
> Â Â Â Â Â Â Â Â Â Â Â Âcontinue;
> + Â Â Â Â Â Â Â }
>
> Â Â Â Â Â Â Â Âevent = cpuc->events[idx];
> Â Â Â Â Â Â Â Âhwc = &event->hw;
> --
> 1.7.2.2
>
>
>
>
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/