Re: [PATCH v2] Softdog enhancement to optionally invoke panicinstead of reboot on timer expiry

From: AmÃrico Wang
Date: Mon Jan 24 2011 - 22:24:30 EST


On Tue, Jan 25, 2011 at 12:03:52AM +0530, Anithra P Janakiraman wrote:
>
>Hi,
>
>We currently have no way of determining the reason for failure when a
>softdog timeout occurs. We use softdog to watch for critical application
>failures, and at the minimum a snapshot of the system would help to
>determine the cause. In such a scenario the application could fail but
>there isn't a softlockup as such, hence the detect softlockup feature
>does not help.
>The patch below adds a module parameter soft_panic which when set to
>1 causes softdog to invoke panic instead of reboot when the softdog
>timer expires. By invoking panic we execute kdump if it is configured
>and the vmcore generated by kdump should provide atleast a minimal idea
>of the reason for failure.
>
>Based on an original patch by Ken Sugawara <sugaken.r3@xxxxxxxxx>
>Signed-off-by: Anithra P J <anithra@xxxxxxxxxxxxxxxxxx>

Cool, using a module parameter is better.

Reviewed-by: WANG Cong <xiyou.wangcong@xxxxxxxxx>

Thanks.

>---
> drivers/watchdog/softdog.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
>Index: linux-2.6.38-rc1/drivers/watchdog/softdog.c
>===================================================================
>--- linux-2.6.38-rc1.orig/drivers/watchdog/softdog.c
>+++ linux-2.6.38-rc1/drivers/watchdog/softdog.c
>@@ -48,6 +48,7 @@
> #include <linux/init.h>
> #include <linux/jiffies.h>
> #include <linux/uaccess.h>
>+#include <linux/kernel.h>
>
> #define PFX "SoftDog: "
>
>@@ -75,6 +76,13 @@
> "Softdog action, set to 1 to ignore reboots, 0 to reboot "
> "(default depends on ONLY_TESTING)");
>
>+
>+static int soft_panic;
>+
>+module_param(soft_panic, int, 0);
>+MODULE_PARM_DESC(soft_panic,
>+ "Softdog action, set to 1 to panic, 0 to reboot (default 0)");
>+
> /*
> * Our timer
> */
>@@ -98,7 +106,10 @@
>
> if (soft_noboot)
> printk(KERN_CRIT PFX "Triggered - Reboot ignored.\n");
>- else {
>+ else if (soft_panic) {
>+ printk(KERN_CRIT PFX "Initiating panic.\n");
>+ panic("Software Watchdog Timer expired.");
>+ } else {
> printk(KERN_CRIT PFX "Initiating system reboot.\n");
> emergency_restart();
> printk(KERN_CRIT PFX "Reboot didn't ?????\n");
>@@ -267,7 +278,8 @@
> };
>
> static char banner[] __initdata = KERN_INFO "Software Watchdog Timer: 0.07 "
>- "initialized. soft_noboot=%d soft_margin=%d sec (nowayout= %d)\n";
>+ "initialized. soft_noboot=%d soft_margin=%d sec soft_panic=%d "
>+ "(nowayout= %d)\n";
>
> static int __init watchdog_init(void)
> {
>@@ -298,7 +310,7 @@
> return ret;
> }
>
>- printk(banner, soft_noboot, soft_margin, nowayout);
>+ printk(banner, soft_noboot, soft_margin, soft_panic, nowayout);
>
> return 0;
> }
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/