Re: i386 cpu hotplug bug - instant reboot when onlining secondary

From: Zwane Mwaikambo
Date: Tue Feb 28 2006 - 17:06:57 EST


On Tue, 28 Feb 2006, Nathan Lynch wrote:

> Zwane Mwaikambo wrote:
> > On Mon, 27 Feb 2006, Nathan Lynch wrote:
> >
> > > Zwane Mwaikambo wrote:
> > > > On Sun, 19 Feb 2006, Nathan Lynch wrote:
> > > >
> > > > > On a dual P3 Xeon machine, offlining and then onlining a cpu makes the
> > > > > box instantly reboot. I've been seeing this throughout the 2.6.16-rc
> > > > > series, but wasn't able to collect more information until now. Not
> > > > > sure when this last worked, unfortunately.
> > > > >
> > > > > With the debugging patch below, I get this on serial console:
> > > >
> > > > Does 2.6.14 work? Also i wonder if it gets out of the trampoline...
> > >
> > > 2.6.14 works (albeit with an APIC error reported). When retesting
> > > 2.6.16-rc4 with your patch on top of my debugging patch, I don't see the
> > > "startup_secondary" line:
> >
> > Hi Nathan,
> >
> > Can you try the following patch? We can start moving the WARM_BOOT_HLT
> > down until it triple faults (i'm assuming it at least gets this far).
>
> Here's what I got with this one on top of a day-old -git (all
> debugging patches still applied):

Looks good, how about the following

Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S
===================================================================
RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 head.S
--- linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S 11 Feb 2006 16:55:14 -0000 1.1.1.1
+++ linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S 28 Feb 2006 22:12:25 -0000
@@ -146,6 +146,12 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
* we know the trampoline has already loaded the boot_gdt_table GDT
* for us.
*/
+#define warm_boot tsc_sync_disabled-__PAGE_OFFSET
+#define WARM_BOOT_HLT \
+ cmpl $0, warm_boot; \
+10: \
+ jne 10b
+
ENTRY(startup_32_smp)
cld
movl $(__BOOT_DS),%eax
@@ -324,6 +330,7 @@ is386: movl $2,%ecx # set MP
cmpb $0,%cl
je 1f # the first CPU calls start_kernel
# all other CPUs call initialize_secondary
+ WARM_BOOT_HLT
call initialize_secondary
jmp L6
1:
Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smpboot.c
--- linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 11 Feb 2006 16:55:14 -0000 1.1.1.1
+++ linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c 28 Feb 2006 15:34:42 -0000
@@ -102,7 +102,7 @@ static cpumask_t smp_commenced_mask;
* is no way to resync one AP against BP. TBD: for prescott and above, we
* should use IA64's algorithm
*/
-static int __devinitdata tsc_sync_disabled;
+int __devinitdata tsc_sync_disabled;

/* Per CPU bogomips and other parameters */
struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/