Re: cpu/hotplug: broken sibling thread hotplug

From: Josh Poimboeuf
Date: Fri Jan 25 2019 - 11:37:03 EST


On Thu, Jan 24, 2019 at 04:57:13PM +0100, Igor Mammedov wrote:
> In case guest is booted with one CPU present and then later
> a sibling CPU is hotplugged [1], it stays offline since SMT
> is disabled.
>
> Bisects to
> 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
> which used __max_smt_threads to decide disabling SMT and in
> case [1] only primary CPU thread is present hence SMT
> is disabled.
>
> Later bc2d8d262cba (cpu/hotplug: Fix SMT supported evaluation),
> rewrites code path but evaluation criteria still depends on
> sibling thread being present at boot time, so problem persist.
>
> 1) QEMU -smp 1,sockets=2,cores=1,threads=2 -monitor stdio ...
> # hotplug sibling thread
> (qemu) device_add qemu64-x86_64-cpu,socket-id=0,core-id=0,thread-id=1
>
> I've failed to find reasoning behind statement:
> "
> cpu/hotplug: detect SMT disabled by BIOS
>
> If SMT is disabled in BIOS, the CPU code doesn't properly detect it.
> "
>
> Question is
> 1: why cpu_smt_check_topology_early() at check_bugs()
> wasn't sufficient to detect SMT disabled in BIOS and
> 2: why side-effect of present at boot siblings were used
> to keep SMT enabled?

Hi Igor,

Thanks for reporting this.

The original problems with SMT disabled in BIOS were:

1) /sys/devices/system/cpu/smt showed "on"; and

2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.

So 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") was
intended to fix that, by reporting SMT as permanently disabled, when the
BIOS has done so.

The problem is, I don't think there's a way to detect a difference
between the HW "SMT disabled by BIOS" case and the virt "Sibling not yet
plugged in" case.

I'd propose that we consider #1 above to *not* be a problem. Because in
the virt case, it's possible that a sibling thread can be brought online
later. So it makes sense to default with smt control "on" to allow for
that possibility.

The real problem is #2. Which is a simple fix: change vmx_vm_init() to
query the current SMT state with sched_smt_active(), instead of looking
at the "control" sysfs value.

How about this patch? It's just a revert of 73d5e2b47264 and
bc2d8d262cba, plus the 1-line vmx_vm_init() change. If it looks ok to
you, I can clean it up and submit an official version.

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 453e66291e38..2a4db86c2780 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -70,7 +70,7 @@ void __init check_bugs(void)
* identify_boot_cpu() initialized SMT support information, let the
* core code know.
*/
- cpu_smt_check_topology_early();
+ cpu_smt_check_topology();

if (!IS_ENABLED(CONFIG_SMP)) {
pr_info("CPU: ");
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 886ad6db0926..494f32ae6e6f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11102,7 +11102,7 @@ static int vmx_vm_init(struct kvm *kvm)
* Warn upon starting the first VM in a potentially
* insecure environment.
*/
- if (cpu_smt_control == CPU_SMT_ENABLED)
+ if (sched_smt_active())
pr_warn_once(L1TF_MSG_SMT);
if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_NEVER)
pr_warn_once(L1TF_MSG_L1D);
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 218df7f4d3e1..5041357d0297 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -180,12 +180,10 @@ enum cpuhp_smt_control {
#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
extern enum cpuhp_smt_control cpu_smt_control;
extern void cpu_smt_disable(bool force);
-extern void cpu_smt_check_topology_early(void);
extern void cpu_smt_check_topology(void);
#else
# define cpu_smt_control (CPU_SMT_ENABLED)
static inline void cpu_smt_disable(bool force) { }
-static inline void cpu_smt_check_topology_early(void) { }
static inline void cpu_smt_check_topology(void) { }
#endif

diff --git a/kernel/cpu.c b/kernel/cpu.c
index d7c2e48ed10d..240fb4ab3f38 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -360,8 +360,6 @@ void __weak arch_smt_update(void) { }
enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
EXPORT_SYMBOL_GPL(cpu_smt_control);

-static bool cpu_smt_available __read_mostly;
-
void __init cpu_smt_disable(bool force)
{
if (cpu_smt_control == CPU_SMT_FORCE_DISABLED ||
@@ -378,25 +376,11 @@ void __init cpu_smt_disable(bool force)

/*
* The decision whether SMT is supported can only be done after the full
- * CPU identification. Called from architecture code before non boot CPUs
- * are brought up.
- */
-void __init cpu_smt_check_topology_early(void)
-{
- if (!topology_smt_supported())
- cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
-}
-
-/*
- * If SMT was disabled by BIOS, detect it here, after the CPUs have been
- * brought online. This ensures the smt/l1tf sysfs entries are consistent
- * with reality. cpu_smt_available is set to true during the bringup of non
- * boot CPUs when a SMT sibling is detected. Note, this may overwrite
- * cpu_smt_control's previous setting.
+ * CPU identification. Called from architecture code.
*/
void __init cpu_smt_check_topology(void)
{
- if (!cpu_smt_available)
+ if (!topology_smt_supported())
cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
}

@@ -409,18 +393,10 @@ early_param("nosmt", smt_cmdline_disable);

static inline bool cpu_smt_allowed(unsigned int cpu)
{
- if (topology_is_primary_thread(cpu))
+ if (cpu_smt_control == CPU_SMT_ENABLED)
return true;

- /*
- * If the CPU is not a 'primary' thread and the booted_once bit is
- * set then the processor has SMT support. Store this information
- * for the late check of SMT support in cpu_smt_check_topology().
- */
- if (per_cpu(cpuhp_state, cpu).booted_once)
- cpu_smt_available = true;
-
- if (cpu_smt_control == CPU_SMT_ENABLED)
+ if (topology_is_primary_thread(cpu))
return true;

/*
diff --git a/kernel/smp.c b/kernel/smp.c
index d86eec5f51c1..084c8b3a2681 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -584,8 +584,6 @@ void __init smp_init(void)
num_nodes, (num_nodes > 1 ? "s" : ""),
num_cpus, (num_cpus > 1 ? "s" : ""));

- /* Final decision about SMT support */
- cpu_smt_check_topology();
/* Any cleanup work */
smp_cpus_done(setup_max_cpus);
}