Re: [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path

From: Borislav Petkov
Date: Fri Jun 20 2014 - 11:23:25 EST


On Fri, Jun 20, 2014 at 10:28:13AM -0400, Boris Ostrovsky wrote:
> Commit 9c15a24b038f4d8da93a2bc2554731f8953a7c17 (x86/mce: Improve
> mcheck_init_device() error handling) unregisters (or never registers)
> MCE's hotplug notifier if an error is encountered.

Well, mcheck_init_device() did encounter errors before that commit too,
can you please go into detail on how exactly you're triggering this?
Which error are you talking about exactly?

Lemme guess: some xen special handling which baremetal doesn't need.

> Since unplugging a CPU would normally result in the notifier deleting
> MCE timer we are now left with the timer running if a CPU is removed on
> a system where mcheck_init_device() had failed.
>
> If we later hotplug this CPU back we add this timer again in
> mcheck_cpu_init()). Eventually the two timers start intefering with each
> other, causing soft lockups or system hangs.
>
> We should leave the notifier always on and, in fact, set it up early
> during the boot.

We do leave it always on - we only unregister it if we've encountered an
error.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/