RE: [Hotplug_sig] RE: [PATCH 1/6]sep initializing rework

From: Protasevich, Natalie
Date: Tue Apr 12 2005 - 16:38:12 EST


> Hello,
> This is a hotplug CPU patch for i386, done against 2.6.12-rc2-mm3.
> Somewhat alternative to the one posted by Li Shaohua, but not really
(and I didn't mean that :).
> If you look closer, our patches are different and can complement each
other I think.
> Li did great job on sep, after-offline cleanup, __devinit etc., and I
have some radical changes
> in the AP bringup mechanism. I left alone __init to __devinit part (I
was going through it lately,
> but I think even though I had few more than Li did, he covered it
sufficiently perhaps).
> I started having doubts in free_initmem() vs __devinit because look
how many of __init's left! just a few :).
> I got rid of do_boot_cpu loop in smpboot.c because the loop
> static void __init smp_init(void) {
> unsigned int i;
> /* FIXME: This should be done in userspace --RR */
> for_each_present_cpu(i) {
> if (num_online_cpus() >= max_cpus)
> break;
> if (!cpu_online(i))
> cpu_up(i);
> }
> ...
> does it again so why leave it in smpboot.c to boot AP's twice.
> I also found that my system fails sooner or later when I try not to
synch runtime booted processor with others,
> so I changed tsc synchronization to only sync between booting CPU and
the one that boots it.
> The patch works for me on Intel 8x generic box, and on ES7000.
> I was asked to separate my patch into smaller ones by the theme, but
I'm posting the entire patch for now,
> because I think it is probably not the final one.
> I think (I hope) I will sync up with Li later on.
> My idea was that if we find a CPU core in ACPI (enabled or disabled),
we encounter for it in sibling map and
> create a sysfs node accordingly, and cpu_possible_map will reflect
that. We take processors up/down
> depending on physical presence using the existing node. That's the
scenario implemented on ES7000
> that reports all possible cores in ACPI marking absent processors as
disabled.
> Runtime enablement/disablement depends on sysfs only and the driving
agent can be anything (ACPI or user)
> that triggers sysfs node for this processor.
> Thanks,
> --Natalie

I forgot to mention that while working on the patch, I found a problem
with ACPI driver while counting number of records in
acpi_table_parse_madt_family() (drivers/acpi/tables.c). It always
returns 0 for the number of records, so this should be fixed for my
patch to work properly, since I rely on number of lapic entries found by
ACPI. I am going to post a bugzilla on this, for now it worked for me as
follows:
while (((unsigned long) entry) + sizeof(acpi_table_entry_header)
< madt_end) {
if (entry->type == entry_id &&
- (!max_entries || count++ < max_entries)
+ ((count++ < max_entries) || !max_entries))
(it wasn't incrementing count if max_entries are >0 ).

Sorry for using attachment in previous email, I am resending the patch
as in-line text:

diff -Naurp linux-2.6.11.6-mm/arch/i386/kernel/acpi/boot.c
linux-2.6.11.6-HC/arch/i386/kernel/acpi/boot.c
--- linux-2.6.11.6-mm/arch/i386/kernel/acpi/boot.c 2005-04-06
23:08:11.000000000 -0400
+++ linux-2.6.11.6-HC/arch/i386/kernel/acpi/boot.c 2005-04-12
08:08:28.000000000 -0400
@@ -229,7 +229,7 @@ acpi_parse_lapic (

/* no utility in registering a disabled processor */
if (processor->flags.enabled == 0)
- return 0;
+ printk(KERN_DEBUG "apic %d is disabled\n",
processor->id);

x86_acpiid_to_apicid[processor->acpi_id] = processor->id;

@@ -689,6 +689,9 @@ acpi_parse_madt_lapic_entries(void)
return count;
}

+ for (i = 0; i < count; i++)
+ cpu_set(i, cpu_possible_map);
+
count = acpi_table_parse_madt(ACPI_MADT_LAPIC_NMI,
acpi_parse_lapic_nmi, 0);
if (count < 0) {
printk(KERN_ERR PREFIX "Error parsing LAPIC NMI
entry\n");
diff -Naurp linux-2.6.11.6-mm/arch/i386/kernel/cpu/common.c
linux-2.6.11.6-HC/arch/i386/kernel/cpu/common.c
--- linux-2.6.11.6-mm/arch/i386/kernel/cpu/common.c 2005-04-06
23:08:11.000000000 -0400
+++ linux-2.6.11.6-HC/arch/i386/kernel/cpu/common.c 2005-04-12
07:44:54.000000000 -0400
@@ -615,7 +615,11 @@ void __init cpu_init (void)
atomic_inc(&init_mm.mm_count);
current->active_mm = &init_mm;
if (current->mm)
+#ifdef CONFIG_HOTPLUG_CPU
+ current->mm = NULL;
+#else
BUG();
+#endif
enter_lazy_tlb(&init_mm, current);

load_esp0(t, thread);
diff -Naurp linux-2.6.11.6-mm/arch/i386/kernel/mpparse.c
linux-2.6.11.6-HC/arch/i386/kernel/mpparse.c
--- linux-2.6.11.6-mm/arch/i386/kernel/mpparse.c 2005-04-06
23:08:12.000000000 -0400
+++ linux-2.6.11.6-HC/arch/i386/kernel/mpparse.c 2005-04-12
08:16:57.000000000 -0400
@@ -125,7 +125,7 @@ static void __init MP_processor_info (st
physid_mask_t tmp;

if (!(m->mpc_cpuflag & CPU_ENABLED))
- return;
+ printk(KERN_DEBUG "apicid %x was
disabled\n",m->mpc_apicid);

apicid = mpc_apic_id(m, translation_table[mpc_record]);

@@ -189,11 +189,13 @@ static void __init MP_processor_info (st
return;
}

+#ifndef CONFIG_HOTPLUG_CPU
if (num_processors >= maxcpus) {
printk(KERN_WARNING "WARNING: maxcpus limit of %i
reached."
" Processor ignored.\n", maxcpus);
return;
}
+#endif
num_processors++;
ver = m->mpc_apicver;

diff -Naurp linux-2.6.11.6-mm/arch/i386/kernel/smpboot.c
linux-2.6.11.6-HC/arch/i386/kernel/smpboot.c
--- linux-2.6.11.6-mm/arch/i386/kernel/smpboot.c 2005-04-06
23:08:12.000000000 -0400
+++ linux-2.6.11.6-HC/arch/i386/kernel/smpboot.c 2005-04-12
08:40:08.000000000 -0400
@@ -73,6 +73,7 @@ cpumask_t cpu_online_map __cacheline_ali

cpumask_t cpu_callin_map;
cpumask_t cpu_callout_map;
+cpumask_t cpu_possible_map;
static cpumask_t smp_commenced_mask;

/* Per CPU bogomips and other parameters */
@@ -195,7 +196,13 @@ valid_k7:
static atomic_t tsc_start_flag = ATOMIC_INIT(0);
static atomic_t tsc_count_start = ATOMIC_INIT(0);
static atomic_t tsc_count_stop = ATOMIC_INIT(0);
-static unsigned long long tsc_values[NR_CPUS];
+union _tsc_values {
+ unsigned long long _tsc_value;
+ struct {
+ unsigned long tsc_value_high;
+ unsigned long tsc_value_low;
+ } _part;
+} tsc_values[NR_CPUS];

#define NR_LOOPS 5

@@ -241,12 +248,6 @@ static void __init synchronize_tsc_bp (v

rdtscll(tsc_values[smp_processor_id()]);
/*
- * We clear the TSC in the last loop:
- */
- if (i == NR_LOOPS-1)
- write_tsc(0, 0);
-
- /*
* Wait for all APs to leave the synchronization point:
*/
while (atomic_read(&tsc_count_stop) !=
num_booting_cpus()-1)
@@ -259,7 +260,7 @@ static void __init synchronize_tsc_bp (v
sum = 0;
for (i = 0; i < NR_CPUS; i++) {
if (cpu_isset(i, cpu_callout_map)) {
- t0 = tsc_values[i];
+ t0 = tsc_values[i]._tsc_value;
sum += t0;
}
}
@@ -270,7 +271,7 @@ static void __init synchronize_tsc_bp (v
for (i = 0; i < NR_CPUS; i++) {
if (!cpu_isset(i, cpu_callout_map))
continue;
- delta = tsc_values[i] - avg;
+ delta = tsc_values[i]._tsc_value - avg;
if (delta < 0)
delta = -delta;
/*
@@ -284,7 +285,7 @@ static void __init synchronize_tsc_bp (v
}
realdelta = delta;
do_div(realdelta, one_usec);
- if (tsc_values[i] < avg)
+ if (tsc_values[i]._tsc_value < avg)
realdelta = -realdelta;

printk(KERN_INFO "CPU#%d had %ld usecs TSC skew,
fixed it up.\n", i, realdelta);
@@ -312,9 +313,10 @@ static void __init synchronize_tsc_ap (v
while (atomic_read(&tsc_count_start) !=
num_booting_cpus())
mb();

- rdtscll(tsc_values[smp_processor_id()]);
+ rdtscll(tsc_values[smp_processor_id()]._tsc_value);
if (i == NR_LOOPS-1)
- write_tsc(0, 0);
+
write_tsc(tsc_values[master]._part.tsc_value_high,
+ tsc_values[master]._part.tsc_value_low);

atomic_inc(&tsc_count_stop);
while (atomic_read(&tsc_count_stop) !=
num_booting_cpus()) mb();
@@ -449,6 +451,7 @@ static void __init start_secondary(void
local_irq_enable();

wmb();
+ enable_sep_cpu(0);
cpu_idle();
}

@@ -843,7 +846,9 @@ static int __init do_boot_cpu(int apicid
x86_cpu_to_apicid[cpu] = apicid;
if (boot_error) {
/* Try to put things back the way they were before ...
*/
+#ifndef CONFIG_HOTPLUG_CPU
unmap_cpu_to_logical_apicid(cpu);
+#endif
cpu_clear(cpu, cpu_callout_map); /* was set here
(do_boot_cpu()) */
cpu_clear(cpu, cpu_initialized); /* was set by
cpu_init() */
cpucount--;
@@ -852,6 +857,8 @@ static int __init do_boot_cpu(int apicid
/* mark "stuck" area as not stuck */
*((volatile unsigned long *)trampoline_base) = 0;

+ if (cpu_has_tsc && cpu_khz)
+ synchronize_tsc_bp();
return boot_error;
}

@@ -987,27 +994,6 @@ static void __init smp_boot_cpus(unsigne
*/
Dprintk("CPU present map: %lx\n",
physids_coerce(phys_cpu_present_map));

- kicked = 1;
- for (bit = 0; kicked < NR_CPUS && bit < MAX_APICS; bit++) {
- apicid = cpu_present_to_apicid(bit);
- /*
- * Don't even attempt to start the boot CPU!
- */
- if ((apicid == boot_cpu_apicid) || (apicid ==
BAD_APICID))
- continue;
-
- if (!check_apicid_present(bit))
- continue;
- if (max_cpus <= cpucount+1)
- continue;
-
- if (do_boot_cpu(apicid))
- printk("CPU #%d not responding - cannot use
it.\n",
- apicid);
- else
- ++kicked;
- }
-
/*
* Cleanup possible dangling ends...
*/
@@ -1057,13 +1043,17 @@ static void __init smp_boot_cpus(unsigne
struct cpuinfo_x86 *c = cpu_data + cpu;
int siblings = 0;
int i;
+#ifndef CONFIG_HOTPLUG_CPU
if (!cpu_isset(cpu, cpu_callout_map))
continue;
+#endif

if (smp_num_siblings > 1) {
for (i = 0; i < NR_CPUS; i++) {
+#ifndef CONFIG_HOTPLUG_CPU
if (!cpu_isset(i, cpu_callout_map))
continue;
+#endif
if (cpu_core_id[cpu] == cpu_core_id[i])
{
siblings++;
cpu_set(i,
cpu_sibling_map[cpu]);
@@ -1094,6 +1084,7 @@ static void __init smp_boot_cpus(unsigne

setup_boot_APIC_clock();

+ enable_sep_cpu(0);
/*
* Synchronize the TSC with the AP
*/
@@ -1195,13 +1186,6 @@ void __cpu_die(unsigned int cpu)

int __devinit __cpu_up(unsigned int cpu)
{
- /* In case one didn't come up */
- if (!cpu_isset(cpu, cpu_callin_map)) {
- printk(KERN_DEBUG "skipping cpu%d, didn't come
online\n", cpu);
- local_irq_enable();
- return -EIO;
- }
-
#ifdef CONFIG_HOTPLUG_CPU
/* Already up, and in cpu_quiescent now? */
if (cpu_isset(cpu, smp_commenced_mask)) {
@@ -1210,11 +1194,19 @@ int __devinit __cpu_up(unsigned int cpu)
}
#endif

+ for (i = 0; i < NR_CPUS; i++)
+ cpu_clear(i, cpu_callout_map);
+ cpu_set(smp_processor_id(), cpu_callout_map);
+ apicid = cpu_present_to_apicid(cpu);
+ do_boot_cpu(apicid, cpu);
local_irq_enable();
/* Unleash the CPU! */
cpu_set(cpu, smp_commenced_mask);
while (!cpu_isset(cpu, cpu_online_map))
mb();
+ cpu_enable(cpu);
+ cpu_clear(cpu, cpu_callout_map);
+ cpu_clear(smp_processor_id(), cpu_callout_map);
return 0;
}

@@ -1223,11 +1215,13 @@ void __init smp_cpus_done(unsigned int m
#ifdef CONFIG_X86_IO_APIC
setup_ioapic_dest();
#endif
+#ifndef CONFIG_HOTPLUG_CPU
zap_low_mappings();
/*
* Disable executability of the SMP trampoline:
*/
set_kernel_exec((unsigned long)trampoline_base,
trampoline_exec);
+#endif
}

void __init smp_intr_init(void)
diff -Naurp linux-2.6.11.6-mm/arch/i386/kernel/sysenter.c
linux-2.6.11.6-HC/arch/i386/kernel/sysenter.c
--- linux-2.6.11.6-mm/arch/i386/kernel/sysenter.c 2005-04-06
23:08:12.000000000 -0400
+++ linux-2.6.11.6-HC/arch/i386/kernel/sysenter.c 2005-04-12
07:59:36.000000000 -0400
@@ -58,7 +58,6 @@ static int __init sysenter_setup(void)
&vsyscall_sysenter_start,
&vsyscall_sysenter_end - &vsyscall_sysenter_start);

- on_each_cpu(enable_sep_cpu, NULL, 1, 1);
return 0;
}

diff -Naurp linux-2.6.11.6-mm/include/asm/smp.h
linux-2.6.11.6-HC/include/asm/smp.h
--- linux-2.6.11.6-mm/include/asm/smp.h 2005-04-06 23:09:26.000000000
-0400
+++ linux-2.6.11.6-HC/include/asm/smp.h 2005-04-12 08:06:48.000000000
-0400
@@ -55,7 +55,7 @@ extern u8 x86_cpu_to_apicid[];

extern cpumask_t cpu_callout_map;
extern cpumask_t cpu_callin_map;
-#define cpu_possible_map cpu_callout_map
+extern cpumask_t cpu_possible_map;

/* We don't mark CPUs online until __cpu_up(), so we need another
measure */
static inline int num_booting_cpus(void)
diff -Naurp linux-2.6.11.6-mm/include/asm-i386/smp.h
linux-2.6.11.6-HC/include/asm-i386/smp.h
--- linux-2.6.11.6-mm/include/asm-i386/smp.h 2005-04-06
23:09:26.000000000 -0400
+++ linux-2.6.11.6-HC/include/asm-i386/smp.h 2005-04-12
08:06:48.000000000 -0400
@@ -55,7 +55,7 @@ extern u8 x86_cpu_to_apicid[];

extern cpumask_t cpu_callout_map;
extern cpumask_t cpu_callin_map;
-#define cpu_possible_map cpu_callout_map
+extern cpumask_t cpu_possible_map;

/* We don't mark CPUs online until __cpu_up(), so we need another
measure */
static inline int num_booting_cpus(void)
diff -Naurp linux-2.6.11.6-mm/init/main.c linux-2.6.11.6-HC/init/main.c
--- linux-2.6.11.6-mm/init/main.c 2005-04-06 23:08:21.000000000
-0400
+++ linux-2.6.11.6-HC/init/main.c 2005-04-12 08:46:54.000000000
-0400
@@ -684,7 +684,9 @@ static int init(void * unused)
* we're essentially up and running. Get rid of the
* initmem segments and start the user-mode stuff..
*/
+#infdef CONFIG_HOTPLUG_CPU
free_initmem();
+#endif
unlock_kernel();
system_state = SYSTEM_RUNNING;
numa_default_policy();


-----Original Message-----
From: hotplug_sig-bounces@xxxxxxxxxxxxxx
[mailto:hotplug_sig-bounces@xxxxxxxxxxxxxx] On Behalf Of Protasevich,
Natalie
Sent: Tuesday, April 12, 2005 11:57 AM
To: Zwane Mwaikambo; Li Shaohua
Cc: Andrew Morton; Len Brown; ACPI-DEV; Ryan Harper;
hotplug_sig@xxxxxxxxxxxxxx; Pavel Machek; Andi Kleen; lkml
Subject: [Hotplug_sig] RE: [PATCH 1/6]sep initializing rework

-----Original Message-----
From: Zwane Mwaikambo [mailto:zwane@xxxxxxxxxxxxxxxx]
Sent: Tuesday, April 12, 2005 6:08 AM
To: Li Shaohua
Cc: lkml; ACPI-DEV; Len Brown; Pavel Machek; Andrew Morton; Protasevich,
Natalie; Ryan Harper
Subject: Re: [PATCH 1/6]sep initializing rework

Hello Shaohua,

On Tue, 12 Apr 2005, Li Shaohua wrote:

> These patches (together with 5 patches followed this one) are updated
> suspend/resume SMP patches. The patches fixed some bugs and do clean
> up as suggested. Now they work for both suspend-to-ram and
suspend-to-disk.
> Patches are against 2.6.12-rc2-mm3.

These patches look good and i think we should go ahead with them. I've
also cross checked with physical hotplug cpu patches for ES7xxx from
Natalie (added to Cc) and it does indeed look like a lot of the code
will work for her too, but i'd appreciate it if she also does a double
check.
Obviously this won't work for other upcoming users of hotplug cpu like
Xen (Ryan added to Cc) but i think we can abstract things later on to
cover other special users.

Thanks Shaohua,
Zwane

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/