Re: 2.6.21-rc5-mm4

From: Jiri Kosina
Date: Fri Apr 06 2007 - 09:24:06 EST


On Wed, 4 Apr 2007, Eric W. Biederman wrote:

> > And the bisection winner is
> >
> > i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch
> >
> > I don't immediately see how it could be causing it, so adding CCs which
> > are listed in the patch.
> Weird. I will have to look at that in a little more detail.
> Do you know if this problem happens on x86_64? What does your .config
> look like? What does /proc/interrupts look like? What kind of hardware
> you running this kernel on? Can anyone else reproduce this?
> The oops clearly shows something using -1 and calling that as an
> address I don't know why, but I'm guessing I have triggered a memory
> stomp somewhere. I think this is the first time I have seen a small
> negative number causing a NULL pointer dereference.
> That patch looks innocuous enough that either:
> - I just missed changing something I should have.
> - Your configuration has an increase in NR_IRQS and that triggered
> something.
> - The patch simply permuted things so a memory stomp now happens
> on the e1000 data structures instead of somewhere else.
> - Something doesn't like large irq numbers.
> This work is essentially a backport from x86_64 so if your hardware
> is 64bit capable testing that should be a fairly easy test, and be
> able to rule out large irq numbers as the culprit.
> Until I get a good look at -mm I'm going to have a hard time guessing.
> But a roving memory stomp is my best guess.

Hi Eric,

after struggling with this issue for some time, I think that it's just
some incosistent usage of NR_IRQS throughout the source probably due to
some include hell. I really don't understand the how the mach-*/ includes
are supposed to work.

I found out (by disassembling resulting vmlinux binaries) that in
arch/i386/kernel/entry.S, the loop in irq_entries_start does too little
iterations compared to NR_IRQS value as seen in for example io_apic.c

The super-stupid proof-patch below fixes the panic on my system. It's just
to demonstrate that the i386 includes really need fixing to be consistent
somehow.

diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S
index 976438c..b20dc07 100644
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -53,6 +53,8 @@
#include <asm/dwarf2.h>
#include "irq_vectors.h"

+#define NR_IRQS 4096
+
/*
* We use macros for low-level operations which need to be overridden
* for paravirtualization. The following will never clobber any registers:

--
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/