[Bug] Spurious hrtimer-interrupts

From: Viresh Kumar
Date: Thu Jul 10 2014 - 06:17:59 EST


Hi Thomas/Daniel et al,

This isn't about the problem I reported earlier, where you advised
to add ONESHOT_STOPPED mode: https://lkml.org/lkml/2014/5/9/508.
Above problem was about stopping the clock-event device when
its not used anymore.

This ($subject) problem was initially spotted on Ivybrdge V2, 12 core
X86 server by Santosh. And then I reproduced it on Dual core ARM
Exynos (isn't that frequent as it was on x86 though).

Problem: Getting spurious ticks where hrtimer_interrupt() returns
without servicing any hrtimers.

Kernel hack to catch this: http://pastebin.com/bTM7nqDc (Over 3.16-rc3)
X86 boot logs: http://pastebin.com/E6axDnsa (search: hrtimer_interrupt)
/proc/cpuinfo: http://pastebin.com/uQx9TmsA

The last I could debug it to is:

- Clockevent device is programmed for time 'x' seconds (Verified this
by storing next-event from within lapic_next_event()).
- Tick fires ~300 us before 'x'
- Traversing through the list of hrtimers doesn't result in any pending
hrtimer and we simply return. And so *spurious* interrupt.

- Happens when ticks are active or stopped (search for "tick-stopped"
in logs)

Driver monitored for x86: arch/x86/kernel/apic/apic.c
Similar behavior observed on exynos with arm_arch_timer.c

I couldn't get any deeper into it to see what's going on. From the behavior
It looks lik the calculations we are doing with dev->mult/shift gives
timeout <= next-event, whereas it should be >= ? Not at all sure though.

Reported-by: Santosh Shukla <santosh.shukla@xxxxxxxxxx>

Note: Even the Hacky patchset that tried to to disable clockevent device
when not used anymore, isn't able to fix it:
https://lkml.org/lkml/2014/5/9/99..

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/