Re: [PATCH][2.5] provisional 32-way x445 patches

From: Zwane Mwaikambo (
Date: Mon Jun 02 2003 - 21:44:08 EST

On Mon, 2 Jun 2003, James Cleverdon wrote:

> Back from holiday.
> This kludge doesn't change any IDT behavior, so it is vulnerable to vector
> exhaustion too. It just deals with large systems that have large I/O APICs.
> Since we are indexing irq_vectors by the sum of all available I/O APIC RTEs
> and not checking for overflow, we can get into trouble.

Yeah i get the same with 2.5.70 on a 32way NUMAQ, basically we just keep
going regardless of what NR_IRQS is and then try and write garbage all
over the IDT. There appears to be something wrong with my patch as it gets
ignored each time i send it.

> Some numbers:
> * A 32-way x445 is made up of four 8-way chassis hooked together by
> scalability cables.
> * Each Summit chassis has 2 I/O APICs with 50 RTEs per. The BIOS guys are
> trying to help out by using some hardware to only use one I/O APIC for all
> but the boot chassis.
> * Each RXE100 PCI expansion box contains one or two I/O APICs with 50 RTEs
> each. Every chassis can have one RXE100.
> Even without PCI expansion boxes, 5 * 50 == 250 which is > 224. The kernel
> overflows irq_vectors and dies.
> Since the value stuffed into irq_vectors is 0x31 to 0xF8, it easily fits into
> a byte. As a quick kludge, I changed the type of irq_vectors and quadrupled
> the number. With 896 elements in the array, the system survived and ran.

Are you implying that the large array stopped your box from booting?

> For a real fix, irq_vectors should be dynamically allocated. But then, I
> should port the dynamic MAX_MP_BUSSES patch from 2.4 to 2.5 anyway....

Hmm dynamic irq_vectors sounds good, we could start off with a static
amount and then dynamically allocate once we start reaching the silly
numbers. Dynamic allocation for all might get interesting at early boot.

This is the patch i use right now to boot 2.5.70 on 32way NUMAQ without
disabling IOAPICs, however i do drop the overflow interrupts.

I also have another patch to just increase NR_IRQS, this was used on the
same 8quad and allowed full functionality of all the devices upto and
including node7

But the static arrays aren't all that nice, do you plan on starting on the
dynamic allocation of NR_IRQS sized arrays sometime soon?


