Re: [Help Test] kdump, x86, acpi: Reproduce CPU0 SMI corruption issueafter unsetting BSP flag

From: HATAYAMA Daisuke
Date: Sun Aug 18 2013 - 22:30:26 EST


(2013/08/15 4:45), Eric W. Biederman wrote:
Jingbai Ma <jingbai.ma@xxxxxx> writes:

I found a side effect of unsetting BSP flag.
It affected system rebooting, once the BSP flags been removed, and issue
reboot command, system will hang after message:
Restarting system.
And have to do a hardware reset to recover it.

I have reproduced this problem on the following systems:
HP EliteBook 6930p
HP Compaq DC7700
HP ProLiant DL980 (4 sockets, 40 cores)

I have an idea: To avoid such kind of issue, we can unset BSP flag in
the first kernel during crash processing, and restore it in the second
kernel in the APs initializing.

The premise was clearing BSP would not be an issue. If we could
reliably count on unsetting the BSP during crash processing we could
just switch to the BSP and be done totally avoid this problem.

Given that there are reald world issues with clearing the BSP flag,
I believe the alternate suggestion was to simply never attempt to start
the bootstrap processor during processor bring up.

If as normal we are running on the bootstrap processor everything will
work the same, but if we are in the kdump scenario we will be short one
core. Being short one core seems like a reasonable tradeoff between
reliability and performance.

Eric

Sorry Eric, I'm not clear to what you mean by ``short one core''...
Which are you suggesting? Disabling BSP if crash happens on AP is reasonable?
Or restricting cpus to a single one only just as the current kdump
configuration is reasonable?

--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/