Re: kernel BUG at arch/x86/kernel/io_apic_64.c:357!

From: Yinghai Lu
Date: Tue Jul 29 2008 - 16:37:48 EST


On Tue, Jul 29, 2008 at 1:14 PM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> "Yinghai Lu" <yhlu.kernel@xxxxxxxxx> writes:
>
>> On Tue, Jul 29, 2008 at 11:35 AM, Yinghai Lu <yhlu.kernel@xxxxxxxxx> wrote:
>>> On Tue, Jul 29, 2008 at 9:09 AM, Dhaval Giani <dhaval@xxxxxxxxxxxxxxxxxx>
>> wrote:
>>>> Hi Ingo, Thomas,
>>>>
>>>> Hit this on 2.6.27-rc1
>>>>
>>>> (The kernel bug is at line 356 (Its 357 as I applied a debug patch to
>>>> print out the irq) (Full dmesg and .config attached)
>>>
>>> can you boot with "debug apic=verbose pci=routeirq"?
>>
>> please try attached patch
>
> Ugh. Yuck bleh nasty gag.
>
> Yes please try the YH's patch that should fix the worst of the
> problem.
>
> YH I think you have hit the root cause of the bug with NR_IRQS being
> defined a ridiculously low value on x86_64. At first glance your
> patch looks reasonable, I had to stop and look why it was needed. At
> second glance it looks like it doesn't go far enough.

x3950 has 8 ioapic
ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 15, version 0, address 0xfec00000, GSI 0-35
ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[36])
IOAPIC[1]: apic_id 14, version 0, address 0xfec01000, GSI 36-71
ACPI: IOAPIC (id[0x0d] address[0xfec02000] gsi_base[72])
IOAPIC[2]: apic_id 13, version 0, address 0xfec02000, GSI 72-107
ACPI: IOAPIC (id[0x0c] address[0xfec03000] gsi_base[108])
IOAPIC[3]: apic_id 12, version 0, address 0xfec03000, GSI 108-143
ACPI: IOAPIC (id[0x0b] address[0xfec04000] gsi_base[144])
IOAPIC[4]: apic_id 11, version 0, address 0xfec04000, GSI 144-179
ACPI: IOAPIC (id[0x0a] address[0xfec05000] gsi_base[180])
IOAPIC[5]: apic_id 10, version 0, address 0xfec05000, GSI 180-215
ACPI: IOAPIC (id[0x09] address[0xfec06000] gsi_base[216])
IOAPIC[6]: apic_id 9, version 0, address 0xfec06000, GSI 216-251
ACPI: IOAPIC (id[0x08] address[0xfec07000] gsi_base[252])
IOAPIC[7]: apic_id 8, version 0, address 0xfec07000, GSI 252-287

that is crazy. I only played that kind of layout in SimNow.

>
> The commit that unified interrupt vector defines appears substantially
> fumbled. YH your description of why your patch is needed is not
> especially useful.
>
> To be perfectly clear. The maximum number of interrupts we can handle is:
> NR_CPUS*NR_VECTORS. Defining NR_IRQS to a lower value is essentially a
> hack to allows us to use less memory as we seldom push that limit.
>
> Further the only valid definition of NR_IRQ_VECTORS is NR_IRQS
> as we index it by irq. Which really means now that we are unifying
> the code we should simply kill NR_IRQ_VECTORS.
>
> I expect there is a lot more in that mess that can be cleaned up.
> Right now irq_vectors.h reads like nonsense to me. I will see if
> I can find some time soon to look into what we should be doing.

good.
wonder if you can make 32bit support same vector on different cpu
support different irq.
or not needed.
also we should dump the irq balance on 32 bit kernel.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/