Re: kernel/timer: avoid spurious ksoftirqd wakeups

From: Marcelo Tosatti
Date: Tue Apr 07 2015 - 18:29:54 EST


On Tue, Apr 07, 2015 at 10:17:23PM +0200, Frederic Weisbecker wrote:
> On Mon, Apr 06, 2015 at 08:51:26PM -0300, Marcelo Tosatti wrote:
> > On Tue, Apr 07, 2015 at 01:34:15AM +0200, Frederic Weisbecker wrote:
> > > Yeah, it would be nice to make sure that the cause of these softirqs isn't
> > > mistakenly ignored.
> > > And also I want to be sure we really understand what we
> > > are doing, which is not the case right now as we don't know what is causing
> > > this expired timer.
> >
> > What is the interrupt that is the cause for tick_nohz_stop_sched_tick,
> > you mean?
> >
> > <...>-45815 [015] d...2.. 25722056692012 (+961446): kvm_exit: reason EXTERNAL_INTERRUPT rip 0x7f5e448479d0 info 0 800000ef
> > <...>-45815 [015] d..h1.. 25722056692844 (+832): apic_timer_fn<-__run_hrtimer
> > <...>-45815 [015] d...1.. 25722056695442 (+2598): raise_softirq_irqoff <-tick_nohz_stop_sched_tick
> >
> > Emulation of guest APIC timer by hrtimer (apic_timer_fn).
>
> Nope, I meant what is the root cause of the softirq.
> But lets continue on that below:
>
> > > Sure, but why is it waking up exactly?
> >
> > Because there is a bug (the patch is trying to fix the bug by
> > raising timer softirq only when timer softirq handler has any
> > work to do).
> >
> > The only timers pending in the timer list are deferred ones
> > from vmstat_update:
> >
> > ksoftirqd/15-265 [015] ....111 25722056709372 (+7098): softirq_entry: vec=1 [action=TIMER]
> > ksoftirqd/15-265 [015] .....11 25722056709964 (+592): run_timer_softirq <-do_current_softirqs
> > ksoftirqd/15-265 [015] ....111 25722056714034 (+4070): timer_expire_entry: timer=ffff88082f6f14a0 function=delayed_work_timer_fn now=4480299175
> > ksoftirqd/15-265 [015] ....112 25722056715738 (+1704):
> > workqueue_queue_work: work struct=ffff88082f6f1480 function=vmstat_update workqueue=ffff88041f408000 req_cpu=5120 cpu=15
> > ksoftirqd/15-265 [015] ....112 25722056716304 (+566): workqueue_activate_work: work struct ffff88082f6f1480
> > ksoftirqd/15-265 [015] ....111 25722056719052 (+2748): timer_expire_exit: timer=ffff88082f6f14a0
> > ksoftirqd/15-265 [015] ....111 25722056719384 (+332): softirq_exit: vec=1 [action=TIMER]
> >
> > Which should only be processed once there are actual add_timer timers
> > being fired (there are no such add_timer timers on this CPU).
> >
> > Does that make sense?
>
> So the source of these softirqs is those deffered timers? But defferable timers
> are only defferable in idle-nohz mode, not full-nohz. They are actually deffered
> in practice in full-nohz but it's a bug :o) (which I need to fix).
>
> Still, I don't think this is the source of the softirqs since your patch fixes
> the issue of non-timers triggering softirqs.
>
> So here is the issue: something that is not a "struct timer_list" is causing the
> expiry time of the next tick to be in the past or now. See tick_nohz_stop_sched_tick(),
> the softirq is triggered when delta_jiffies < 1

delta_jiffies = NEXT_TIMER_MAX_DELTA.

tick_nohz_stop_sched_tick: delta_jiffies: 1073741823 rcu_delta_jiffies: 18446744073709551615 tick_stopped: 1

> or when the timer fails to be reprogrammed
> because it has already expired.

Right, missed that. I'll ask Luiz to gather info on why its
failing.

>
> What can cause this expiry time to be now or in the past? Well for that we need to
> check everything that is used to evaluate the next tick:
>
> 1) struct timer_list Timers
> 2) low-res hrtimers
> 3) scheduler_tick_max_deferment
> 4) timekeeping_max_deferment
> 5) (rcu|arch|irq_work)_needs_tick()
> 6) maybe something else I'm missing
>
> Your patch has reduced the softirq to only be triggered in case 1) and it works
> for you. This means the spurious softirqs that you saw were caused by 2,3,4,5 or 6.
> I want to know which one and why because I need to understand exactly which event
> is going to not trigger a softirq anymore after this patch. We want know that to
> ensure there is no side effect after your patch.
>
> Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/