Re: [PATCH] hangcheck-timer

From: Zwane Mwaikambo (zwane@holomorphy.com)
Date: Thu Jan 16 2003 - 18:43:47 EST


On Thu, 16 Jan 2003, Joel Becker wrote:

> Folks,
> Attached is a patch adding hangcheck-timer. It is used to detect
> when the system goes out to lunch for a period of time, and then
> returns. This is interesting for debugging drivers as well as for
> clustering environments.
> The module sets a timer. When the timer goes off, it then uses
> get_cycles() to determine how much real time has passed.
> On a normal system, the real elapsed time will be almost
> identical to the expected timer duration. However, if a device decided
> to udelay for 60 seconds (or some other circumstance), the module takes
> notice. If the margin of error passes a threshold, the driver prints a
> warning or the machine is rebooted.
> There are three parameters.

NMI watchdog? Doesn't look like this can catch cases where you have
interrupts disabled and spinning on a lock, if you manage to run this code
then the box isn't really locked up. In which case the software watchdog
should be as good as this.

        Zwane

-- 
function.linuxpower.ca

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Jan 23 2003 - 22:00:14 EST