[PATCH 4/7] ia64, kdump: Don't offline APs

From: Hidetoshi Seto
Date: Thu Jun 18 2009 - 02:50:07 EST


INIT on AP going to be offline have a problem.

Since psr.mc is cleared when bits in psr are set to SAL_PSR_BITS_TO_SET
in ia64_jump_to_sal(), so there is a small window that the cpu can receive
INIT even if the cpu enter there via INIT handler. In this window we do
restore of registers for SAL, so INIT asserted here will not work properly.

It is hard to remove this window by masking INIT (i.e. setting psr.mc)
because we have to unmask it later in OS, because we have to use branch
instruction (br.ret, not rfi) to return SAL, due to OS_BOOT_RENDEZ to SAL
return convention.

I suppose this window will not be a real problem on cpu offline if we can
educate people not to push INIT button during hotplug operation. However
only exception is a race in kdump and INIT. Now kdump returns APs to SAL
before processing dump, but the kernel might receive INIT at that point in
time. Such INIT might be asserted by kdump itself if an AP doesn't react
IPI soon and kdump decided to use INIT to stop the AP.

Such panic+INIT or INIT+INIT cases should be rare, but it will be happy
if we can retrieve crashdump even in such cases. So it will be better
to stop returning APs to SAL by kdump.

I confirmed that the kdump sometime hangs by concurrent INITs (another
INIT after an INIT), and it doesn't hang after applying this patch.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@xxxxxxxxxxxxxx>
Cc: Vivek Goyal <vgoyal@xxxxxxxxxx>
Cc: Haren Myneni <hbabu@xxxxxxxxxx>
Cc: kexec@xxxxxxxxxxxxxxxxxxx
---
arch/ia64/kernel/crash.c | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/arch/ia64/kernel/crash.c b/arch/ia64/kernel/crash.c
index 48b69fd..eacedfc 100644
--- a/arch/ia64/kernel/crash.c
+++ b/arch/ia64/kernel/crash.c
@@ -142,10 +142,6 @@ kdump_cpu_freeze(struct unw_frame_info *info, void *arg)
atomic_inc(&kdump_cpu_frozen);
kdump_status[cpuid] = 1;
mb();
-#ifdef CONFIG_HOTPLUG_CPU
- if (cpuid != 0)
- ia64_jump_to_sal(&sal_boot_rendez_state[cpuid]);
-#endif
for (;;)
cpu_relax();
}
--
1.6.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/