Re: [PATCH 2/2] x86: re-enable MCE on secondary CPUS after suspend/resume

From: Andi Kleen
Date: Fri Dec 12 2008 - 14:06:21 EST


Andreas Herrmann <andreas.herrmann3@xxxxxxx> writes:

> Impact: fix suspend/resume bug with MCE
>
> After suspend/resume MCx_CTL registers of secondary CPUs are cleared.
> (At least that's what I've observed on several systems.)
> Linux currently only re-initializes MCE on the boot CPU - see mce_resume().
> Thus after suspend/resume we end up with a system where MCE is active
> on the boot CPU but switched off on all other CPUs.
>
> By calling mce_init() whenever a CPU comes online this problem is
> solved.

Can you double check that please?

Suspend/resume are supposted to hotunplug all CPUs except the BP and
then re-online them on resume (with "disable_nonboot_cpus()) . The
re-online initializes MCEs in the standard CPU bootup path.

A good way is to stick a WARN_ON(num_online_cpus() > 1) into
mce_suspend(). I had that here for some time and didn't see
it trigger.

I got a couple of suspend bug fixes in my mce improvement tree, see:

http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git;a=history;f=arch/x86/kernel/cpu/mcheck/mce_64.c;h=9512a7eab4e7b03a584f5bb647bd242bd4c003dc;hb=x86/mce

During review it was decided to all defer it to .29 though.

-Andi

--
ak@xxxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/