[patch, 2.6.10-rc2-mm3, x86] fix reboot hang / APIC errors

From: Ingo Molnar
Date: Sat Nov 27 2004 - 01:22:29 EST



the attached patch fixes SMP/x86 systems to not hang during reboot in
certain circumstances, printing out an endless stream of:

APIC error on CPU1: 00(04)
APIC error on CPU1: 04(04)
APIC error on CPU1: 04(04)

The bug is this: sys_reboot() calls device_shutdown(), which calls each
registered driver and shuts it down. One of them is lapic0, which is
lethal: it will disable the local APIC on the currently executing CPU.
This is buggy in a number of ways 1) there's no guarantee that the
reboot process wont migrate to another CPU at this point (it's still a
fully functioning kernel) 2) if another CPU tries to send an cross-CPU
IPI message after this CPU's local APIC has been disabled, that other
CPU will get an infinite stream of APIC error interrupts - locking the
system up in a flood of messages. The reboot never happens.

the fix is to do what we always did: only disable the local APIC in the
very final moments, as part of the smp_send_stop() logic.

AFAICS, this is a fresh changeset that came in over the ACPI BK tree, it
has not been committed to Linus' tree yet. This bug was found via
PREEMPT_RT where the race is much more likely to trigger, but there's no
reason why this could not occur in the generic kernel.

similarly, i think it's unfortunate that the IO-APIC driver has a
shutdown handler as well - while i cannot see a bug scenario right now,
i think it's quite fragile to disable all external interrupts (including
the timer interrupt!) at this point. Again, this is best done in
machine_restart(). (which already does it.)

Ingo

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>

--- linux/arch/i386/kernel/apic.c.orig
+++ linux/arch/i386/kernel/apic.c
@@ -654,9 +654,12 @@ static int lapic_resume(struct sys_devic
}


+/*
+ * This device has no shutdown method - fully functioning local APICs
+ * are needed on every CPU up until machine_halt/restart/poweroff.
+ */
static struct sysdev_class lapic_sysclass = {
set_kset_name("lapic"),
- .shutdown = lapic_shutdown,
.resume = lapic_resume,
.suspend = lapic_suspend,
};

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/