[PATCH RFC] x86/cpu: fix intermittent lockup on poweroff

From: Tony Battersby
Date: Tue Apr 25 2023 - 15:57:20 EST


In stop_this_cpu(), make sure the CPUID leaf exists before accessing the
leaf. This fixes a lockup on poweroff 50% of the time due to the wrong
branch being taken randomly on some CPUs (seen on Supermicro X8DTH-6F
with Intel Xeon X5650).

Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use")
Cc: <stable@xxxxxxxxxxxxxxx> # 5.18+
Signed-off-by: Tony Battersby <tonyb@xxxxxxxxxxxxxxx>
---

NOTE: I don't have any AMD CPUs to test, so I was unable to fully test
this patch. Could someone with an AMD CPU that supports SME please test
this and make sure it calls native_wbinvd()?


arch/x86/kernel/process.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b650cde3f64d..26aa32e8f636 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -754,13 +754,15 @@ bool xen_set_default_idle(void)

void __noreturn stop_this_cpu(void *dummy)
{
+ struct cpuinfo_x86 *c = this_cpu_ptr(&cpu_info);
+
local_irq_disable();
/*
* Remove this CPU:
*/
set_cpu_online(smp_processor_id(), false);
disable_local_APIC();
- mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
+ mcheck_cpu_clear(c);

/*
* Use wbinvd on processors that support SME. This provides support
@@ -774,7 +776,8 @@ void __noreturn stop_this_cpu(void *dummy)
* Test the CPUID bit directly because the machine might've cleared
* X86_FEATURE_SME due to cmdline options.
*/
- if (cpuid_eax(0x8000001f) & BIT(0))
+ if (c->extended_cpuid_level >= 0x8000001f &&
+ (cpuid_eax(0x8000001f) & BIT(0)))
native_wbinvd();
for (;;) {
/*
--
2.25.1