Re: New v. v. experimental HOTPLUG CPU megapatch.

From: Ingo Molnar
Date: Tue Feb 03 2004 - 05:09:31 EST



* Rusty Russell <rusty@xxxxxxxxxxxxxxx> wrote:

> Patch against 2.6.2-rc2-mm2. Works basically, gives "APIC error on
> CPU1: 08(08)" under stress. Clues welcome.

APIC error 08 is receive error. Ie. most likely there was a pending IPI
(or pending hwirq) to that CPU but the CPU was zapped and the APIC
reset. I'd suggest to add "sti;nop;cli" instructions after the IO-APIC
masks have been redirected [note the nop - the interrupt-enable boundary
on x86 is two instructions from sti] - to flush out pending hardirqs and
IPIs. After this point nothing is supposed to reach this CPU. Enabling
irqs at this point should not cause any races, because you do this
first, right?

the pending cross-CPU-IPI case should not happen if the infrastructure
is correct, external hardirqs are not an issue unless it's an
edge-triggered device. So the worst-case with your current code could be
a lost timer IRQ or a lost edge-triggered PCI irq (old ne2k cards).

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/