Re: 2.6.20.4: NETDEV WATCHDOG and lockups

From: Christian Kujau
Date: Wed Apr 04 2007 - 09:20:53 EST


On Wed, 4 Apr 2007, Jarek Poplawski wrote:
So, it's a lot sooner than before. (BTW, isn't there anything
in debug log?)

No, nothing. I've set up remote-syslgging to the other node (node1 logging to node2 and vice versa) - nothing :(

I see both CPUs did interrupt handling again.

Yes, when booting with 'lapic' both CPUs/cores are handling interrupts again. However, since 'lapic' seems to lead to crashes here, we would be more than happy to just boot with 'noapic' but have 'irqbalance' working. Unfortunately, irqbalance is unable to write to /proc/irq/*/smp_affinity (did not help to disable CONFIG_IRQBALANCE).

Maybe it's a real locking problem. Here are some more
suggestions for testing (if you don't find anything better):
- try without SMP, so: 'acpi=off lapic nosmp'

Yeah, we'll see if we still have time for trying this. But I figure this will not be a real (long term) option for us :(

- lock debugging turned on as much as possible
plus maybe for curiosity:
- different CONFIG_HZ > - 8139TOO_PIO = y

Indeed, that's what I too wanted to do.

@Malte: any plans for another downtime?

Thanks for your comments!

Christian.
--
BOFH excuse #265:

The mouse escaped.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/