Re: [PATCH v3 2/2] Make hard lockup detection use timestamps

From: Don Zickus
Date: Mon Aug 01 2011 - 15:24:22 EST


On Mon, Aug 01, 2011 at 11:33:24AM -0700, ZAK Magnus wrote:
> Okay... So this is a problem we need to solve. Does there exist a good
> way to output a stack trace to, say, a file in /proc? I think that
> would be an appealing solution, if doable.

One idea I thought of to workaround this is to save the timestamp and the
watchdog bool and restore after the stack dump. It's a cheap hack and I
am not to sure about the locking as it might race with
touch_nmi_watchdog(). But it gives you an idea what I was thinking.

Being in the nmi context, no one can normally touch these variables,
except for another cpu using touch_nmi_watchdog() (or watchdog_enable()
but that should never race in these scenarios).

Cheers,
Don

compile tested only.


diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 17bcded..2dcedb3 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -214,6 +214,9 @@ void touch_softlockup_watchdog_sync(void)
static void update_hardstall(unsigned long stall, int this_cpu)
{
int update_stall = 0;
+ int ts;
+ bool touched;
+
if (stall > hardstall_thresh &&
stall > worst_hardstall + hardstall_diff_thresh) {
unsigned long flags;
@@ -225,10 +228,14 @@ static void update_hardstall(unsigned long stall, int this_cpu)
}

if (update_stall) {
+ ts = __this_cpu_read(watchdog_touch_ts);
+ touched = __this_cpu_read(watchdog_nmi_touch);
printk(KERN_WARNING "LOCKUP may be in progress!"
"Worst hard stall seen on CPU#%d: %lums\n",
this_cpu, stall);
dump_stack();
+ __this_cpu_write(watchdog_touch_ts, ts);
+ __this_cpu_write(watchdog_nmi_touch, touched);
}
}

@@ -262,6 +269,9 @@ static int is_hardlockup(int this_cpu)
static void update_softstall(unsigned long stall, int this_cpu)
{
int update_stall = 0;
+ int ts;
+ bool touched;
+
if (stall > get_softstall_thresh() &&
stall > worst_softstall + softstall_diff_thresh) {
unsigned long flags;
@@ -273,10 +283,14 @@ static void update_softstall(unsigned long stall, int this_cpu)
}

if (update_stall) {
+ ts = __this_cpu_read(watchdog_touch_ts);
+ touched = __this_cpu_read(watchdog_nmi_touch);
printk(KERN_WARNING "LOCKUP may be in progress!"
"Worst soft stall seen on CPU#%d: %lums\n",
this_cpu, stall);
dump_stack();
+ __this_cpu_write(watchdog_touch_ts, ts);
+ __this_cpu_write(watchdog_nmi_touch, touched);
}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/