Re: [PATCH] ftrace based hard lockup detector

From: Ingo Molnar
Date: Mon Jan 19 2009 - 08:25:50 EST



* Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

>
> On Mon, 19 Jan 2009, Ingo Molnar wrote:
>
> >
> > * Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >
> > > On Sun, 18 Jan 2009, Frederic Weisbecker wrote:
> > >
> > > > Like the NMI watchdog, this feature try to detect hard lockups by
> > > > lurking at the non-progress of the timer interrupts.
> > > >
> > > > You can enable it at boot time by passing the ftrace_hardlockup parameter.
> > > > I plan to add a debugfs file to enable/disable at runtime.
> > > >
> > > > When a hardlockup is detected, it will print a backtrace. Perhaps it
> > > > would be good to print the locks held from lockdep too?
> > > >
> > > > It only support x86 for the moment, because a kind of generic timer interrupt
> > > > counter is needed on all archs to have it generic.
> > > >
> > > > Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> > >
> > > Hi Frederic,
> > >
> > > This seems like a rewrite of the NMI lockup code. In my debugging, I
> > > simply put ftrace_dump in the NMI lockup, which gives me a ftrace dump
> > > as soon as NMI detects a lockup. I'm a bit confused at what this gives
> > > us over that?
> >
> > this is different from the NMI watchdog in a number of ways:
> >
> > - it works on all platforms and in all situations where the NMI watchdog
> > does not work.
> >
> > - in theory it can detect hard lockups in situations where the NMI
> > watchdog is disabled, such as suspend/resume or early bootup.
> > (especially early bootup lockups are nasty and the NMI watchdog is
> > enabled relatively late)
> >
> > - it could be extended to detect 'soft' lockups too - i.e. we could have
> > a one-stop facility to detect all kinds of "kernel does not seem to
> > progress" lockups.
> >
> > But it's not as complete as the NMI watchdog: it relies on instrumented
> > function calls rolling on and on during the lockup - that's not the case
> > when we get a hard lockup due to a tight, infinite loop somewhere.
>
> Ah, OK, the check is in the function tracer. Hmm, my logdev code had an
> option to enable tracing at early bootup. Instead of using the normal
> memory alloction for the ring buffer, it needed to use alloc_bootmem. I
> wonder if it would be worth it to allow for a tracer to do the same if
> it needs to be allocated early on (before memory is initialized)?

Yes, very much so!

Especially when using things like dump_trace options, this would be handy.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/