Re: [PATCH] ftrace based hard lockup detector

From: Steven Rostedt
Date: Mon Jan 19 2009 - 08:18:19 EST



On Mon, 19 Jan 2009, Ingo Molnar wrote:

>
> * Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> > On Sun, 18 Jan 2009, Frederic Weisbecker wrote:
> >
> > > Like the NMI watchdog, this feature try to detect hard lockups by
> > > lurking at the non-progress of the timer interrupts.
> > >
> > > You can enable it at boot time by passing the ftrace_hardlockup parameter.
> > > I plan to add a debugfs file to enable/disable at runtime.
> > >
> > > When a hardlockup is detected, it will print a backtrace. Perhaps it
> > > would be good to print the locks held from lockdep too?
> > >
> > > It only support x86 for the moment, because a kind of generic timer interrupt
> > > counter is needed on all archs to have it generic.
> > >
> > > Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> >
> > Hi Frederic,
> >
> > This seems like a rewrite of the NMI lockup code. In my debugging, I
> > simply put ftrace_dump in the NMI lockup, which gives me a ftrace dump
> > as soon as NMI detects a lockup. I'm a bit confused at what this gives
> > us over that?
>
> this is different from the NMI watchdog in a number of ways:
>
> - it works on all platforms and in all situations where the NMI watchdog
> does not work.
>
> - in theory it can detect hard lockups in situations where the NMI
> watchdog is disabled, such as suspend/resume or early bootup.
> (especially early bootup lockups are nasty and the NMI watchdog is
> enabled relatively late)
>
> - it could be extended to detect 'soft' lockups too - i.e. we could have
> a one-stop facility to detect all kinds of "kernel does not seem to
> progress" lockups.
>
> But it's not as complete as the NMI watchdog: it relies on instrumented
> function calls rolling on and on during the lockup - that's not the case
> when we get a hard lockup due to a tight, infinite loop somewhere.

Ah, OK, the check is in the function tracer. Hmm, my logdev code had an
option to enable tracing at early bootup. Instead of using the normal
memory alloction for the ring buffer, it needed to use alloc_bootmem. I
wonder if it would be worth it to allow for a tracer to do the same if it
needs to be allocated early on (before memory is initialized)?

--Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/