8390.c + 2.1 SMP, IO-APIC irq handling anomaly, 2.1.73 patch

MOLNAR Ingo (mingo@chiara.csoma.elte.hu)
Fri, 19 Dec 1997 14:00:45 +0100 (CET)

there were various bug reports regarding 8390 based cards,
strange error messages and interface stoppage under heavy
load, for basically all 2.1 kernels since the 2.1.30-ish
IRQ handling rewrite. Also, SMP+Adaptec bug reports do not
seem to have stopped either. I have now such a 8390 based
card in my SMP box, and the problem seems to be a generic
IO-APIC IRQ handling anomaly.

the direct cause of 'Reentering the interrupt handler!',
'Tx request while isr active' and 'Interrupted while
interrupts are masked!' messages and interface stoppage
was that mysteriously the 8390 irq handler was executing
on CPU#0, while we were busy transmitting on CPU#1. We
have synchronize_irq() to catch such cases, the 8390.c
code is indeed correct at first sight:

/* Mask interrupts from the ethercard. */
outb_p(0x00, e8390_base + EN0_IMR);

first, there is a 8390-specific problem with this code:
it has to be executed in cli() mode, since the 8390 has
a 'current command page' state, which is by convention
PAGE0, but can be changed to PAGE1 by an interrupt
handler. This assumption works correctly on UP, but not
on SMP, if we have an IRQ handler executing on CPU#0,
the above code might have no effect. But this window is
very small and fixing it did not prevent those messages.

the second, much more subtle problem is that if this
code is executed on a non-external-IRQ-accepting CPU
(currently all IRQs go to CPU#0), it does not seem
to be guaranteed that the external IO-APIC stops
emitting interrupts. I have measured the time window, and
it's around ~9 usecs. As far as i could experiment with
this on a 2-CPU box, this ill-behavior happens only if
we do the above code from the second CPU. This is
probably a matter of timing, the receiving CPU has much
less paralellizm with an IRQ handler than a different CPU.
[see the patch for more speculation and timing-info]

the first solution was to add a 10 usecs delay to
synchronize_irq(), then i tried to find a wait-less
solution, but no luck ... eg. i've tried to
disable_irq()/enable_irq() the IO-APIC (which under Linux
is in 8259-emulation mode), but no effect. Another
solution would be to issue an IPI to all CPUs, but this
has higher overhead than the 10 usecs wait (and it scales
linearly with the number of CPUs ...), i think.

thinking about it, this problem should affect _all_
synchronize_irq() users, maybe this explains some of
the Adaptec+SMP 2.1 anomalies as well? If anyone has
a better solution, and/or a good explanation why
Intel has done the IO-APIC this way, please speak up ;)

-- mingo

--- linux/arch/i386/kernel/irq.c.orig Tue Dec 23 01:43:33 1997
+++ linux/arch/i386/kernel/irq.c Thu Dec 25 01:08:16 1997
@@ -35,6 +35,7 @@
#include <asm/bitops.h>
#include <asm/smp.h>
#include <asm/pgtable.h>
+#include <asm/delay.h>

#include "irq.h"

@@ -412,19 +413,39 @@
* are no interrupts that are executing on another
* CPU we need to call this function.
+ * We have to give pending interrupts a chance to
+ * arrive (ie. let them get until hard_irq_enter()),
+ * even if they are arriving to another CPU.
+ *
* On UP this is a no-op.
void synchronize_irq(void)
- int cpu = smp_processor_id();
- int local_count = local_irq_count[cpu];
- /* Do we need to wait? */
- if (local_count != atomic_read(&global_irq_count)) {
- /* The stupid way to do this */
- cli();
- sti();
- }
+ /*
+ * Yes it's ugly, since we cannot know in advance what is
+ * pending on _another_ CPU. (without doing an IPI that is)
+ */
+ cli();
+ udelay(10); /*
+ * if a card has just issued an IRQ shortly before, we
+ * have to wait about this amount of time worst case,
+ * to make sure the IRQ gets propagated from the PIC
+ * to the CPU properly.
+ *
+ * - ~8 usecs INTA handling overhead
+ * [we cannot stop an already accepted irq from
+ * doing it's INTA cycle, i think]
+ * - ~1 usecs until the CPU executes
+ * hard_irq_enter()
+ */
+ sti(); /*
+ * here we let them in (they are polling
+ * on &global_irq currently, or are pending
+ * on this local CPU)
+ */
+ cli(); /* we make sure they are finished */
+ sti();

static inline void get_irqlock(int cpu, unsigned long where)
--- linux/drivers/net/8390.c.orig Mon Dec 22 23:21:56 1997
+++ linux/drivers/net/8390.c Thu Dec 25 02:10:02 1997
@@ -184,8 +184,10 @@
length = skb->len;

/* Mask interrupts from the ethercard. */
+ cli();
outb_p(0x00, e8390_base + EN0_IMR);
- synchronize_irq();
+ synchronize_irq(); /* does implicit sti() */
if (dev->interrupt) {
printk("%s: Tx request while isr active.\n",dev->name);
outb_p(ENISR_ALL, e8390_base + EN0_IMR);