RE: CONFIG_IRQBALANCE for AMD64?

From: Nakajima, Jun
Date: Fri May 28 2004 - 18:40:29 EST


>From: Andi Kleen [mailto:ak@xxxxxx]
>Sent: Friday, May 28, 2004 3:54 PM
>To: Nakajima, Jun
>Cc: Andi Kleen; Martin J. Bligh; linux-kernel@xxxxxxxxxxxxxxx
>Subject: Re: CONFIG_IRQBALANCE for AMD64?
>
>On Fri, May 28, 2004 at 03:05:48PM -0700, Nakajima, Jun wrote:
>> >At least the AMD chipsets found in most Opteron boxes need software
>> >balancing too.
>>
>> Actually lowest priority delivery works on P4 and AMD (I did not
tested
>> it on AMD, though), if we _update_ TPR. But I don't recommend that,
>
>True. I remember there was even a patch for that from James C. long
ago.
>I considered at one point to add it to x86-64, but ended up
>not doing anything and just recommending irqbalanced.
>
>[I didn't test it neither, so no guarantee TPR really works on AMD]
>
>> instead we should implement the similar or optimized behavior in
>> software because "soft TPR" can be more efficient and scalable. And I
>> think this is something in my mind, and I think the kernel should do
it.
>
>I'm not convinced of that. At least the current i386 implemention
>is basically a kernel thread that wakes up regularly, reads some
>statistics and then updates the APIC.

That's true, and the kirqd code basically fixed the initial solution
from Ingo, i.e. round-robin. However, during the development, we found
such simple things (e.g. round-robin) worked better in _some cases_ than
the statistics-based irq balancing. I think SuSE has some improved
round-robin for x86.

The problems with the initial round-robin were basically:
1. Too frequent -- it hurts cache, as you know
2. Interrupts throng to idle CPUs -- everybody jumps to idle CPU(s) if
any
3. IRQs move together in sync -- they bomb the same CPU together

Instead of digging those more, we implemented the current code in the
kernel simply because we got that idea. So based on those experiences,
there are a couple things we should explore more, and "soft TPR" will be
one of them. Anyway, we need to show the code and data, to convince
people.

Jun

>
>There is not much reason this cannot be done in user space, and
>user space has the advantage that more advanced heuristics (which
>I'm sure will be there) can be more easily implemented.
>
>And handling all interrupts at CPU #0 during early boot up is
>not really an issue.
>
>An kernel implementation may make sense when you're doing something
>really dynamic: e.g. not just a timer, but dynamically redirecting
>network interrupts to the CPU the process who will read from the
>socket runs on. Obviously it would need kernel support for that, since
>user space could not keep up with such a high sampling rate. But that's
>future research work (if it can be even done generically at all)
>and I don't see it on the radar screen anytime soon. We first need to
solve
>the NUMA scheduling problem, which is already hard enough ;-)
>
>And for the simpler heuristics that don't need real time updates
>user space is probably better.
>
>-Andi
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/