Re: [BUG] lockup with the latest kernel

From: Andrew Morton
Date: Wed Aug 19 2009 - 12:19:16 EST


On Wed, 19 Aug 2009 11:49:25 -0400 (EDT) Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> Always happens where one CPU is sending an IPI and the other has the rq
> spinlock. Seems to be that the IPI expects the other CPU to not have
> interrupts disabled or something?
>
> Note, I've seen this on 2.6.30-rc6 as well (yes that's 2.6.30). But this
> does not happen on 2.6.29. Unfortunately, 2.6.29 makes my NIC go kaputt
> for some reason.
>
> I've enabled LOCKDEP and it just makes the bug trigger easier.
>
> Anyway, anyone have any ideas?

We'd need to see the backtrace on the target CPU.

It shouldn't be too hard - set that CPU's bit in
arch/x86/kernel/apic/nmi.c:backtrace_mask and then clear it again when
that CPU has responded.

Or even:

diff -puN arch/x86/kernel/apic/nmi.c~a arch/x86/kernel/apic/nmi.c
--- a/arch/x86/kernel/apic/nmi.c~a
+++ a/arch/x86/kernel/apic/nmi.c
@@ -387,6 +387,8 @@ void touch_nmi_watchdog(void)
}
EXPORT_SYMBOL(touch_nmi_watchdog);

+extern int wizzle;
+
notrace __kprobes int
nmi_watchdog_tick(struct pt_regs *regs, unsigned reason)
{
@@ -415,7 +417,8 @@ nmi_watchdog_tick(struct pt_regs *regs,
}

/* We can be called before check_nmi_watchdog, hence NULL check. */
- if (backtrace_mask != NULL && cpumask_test_cpu(cpu, backtrace_mask)) {
+ if (cpu == wizzle ||
+ (backtrace_mask != NULL && cpumask_test_cpu(cpu, backtrace_mask))) {
static DEFINE_SPINLOCK(lock); /* Serialise the printks */

spin_lock(&lock);
diff -puN arch/x86/kernel/smp.c~a arch/x86/kernel/smp.c
--- a/arch/x86/kernel/smp.c~a
+++ a/arch/x86/kernel/smp.c
@@ -111,13 +111,17 @@
* it goes straight through and wastes no time serializing
* anything. Worst case is that we lose a reschedule ...
*/
+int wizzle = -1;
+
static void native_smp_send_reschedule(int cpu)
{
if (unlikely(cpu_is_offline(cpu))) {
WARN_ON(1);
return;
}
+ wizzle = cpu;
apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR);
+ wizzle = -1;
}

void native_send_call_func_single_ipi(int cpu)
_


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/