Re: [PATCH] x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits

From: Borislav Petkov
Date: Tue Feb 15 2022 - 11:31:53 EST


On Tue, Feb 15, 2022 at 04:11:03AM -0800, Pawan Gupta wrote:
> That is exactly what this patch is fixing. Please let me know if you
> have any questions.

Just one: does the explanation I've written for this mess, sound about
right?

I'd like for this to be documented so that I don't scratch my head again
when looking at this again later.

Btw, lemme add Cooper to Cc to doublecheck me - he usually knows those
things.

Thx.

---
From: Pawan Gupta <pawan.kumar.gupta@xxxxxxxxxxxxxxx>
Date: Wed, 9 Feb 2022 13:04:36 -0800
Subject: [PATCH] x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits

tsx_clear_cpuid() uses MSR_TSX_FORCE_ABORT to clear CPUID.RTM and
CPUID.HLE. Not all CPUs support MSR_TSX_FORCE_ABORT, alternatively use
MSR_IA32_TSX_CTRL when supported.

[ bp: Document how and why TSX gets disabled. ]

Fixes: 293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts")
Reported-by: kernel test robot <lkp@xxxxxxxxx>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@xxxxxxxxxxxxxxx>
Signed-off-by: Borislav Petkov <bp@xxxxxxx>
Tested-by: Neelima Krishnan <neelima.krishnan@xxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
Link: https://lore.kernel.org/r/5bd785a1d6ea0b572250add0c6617b4504bc24d1.1644440311.git.pawan.kumar.gupta@xxxxxxxxxxxxxxx
---
arch/x86/kernel/cpu/intel.c | 1 +
arch/x86/kernel/cpu/tsx.c | 54 ++++++++++++++++++++++++++++++++-----
2 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 8321c43554a1..8abf995677a4 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -722,6 +722,7 @@ static void init_intel(struct cpuinfo_x86 *c)
else if (tsx_ctrl_state == TSX_CTRL_DISABLE)
tsx_disable();
else if (tsx_ctrl_state == TSX_CTRL_RTM_ALWAYS_ABORT)
+ /* See comment over that function for more details. */
tsx_clear_cpuid();

split_lock_init();
diff --git a/arch/x86/kernel/cpu/tsx.c b/arch/x86/kernel/cpu/tsx.c
index 9c7a5f049292..2835fa89fc6f 100644
--- a/arch/x86/kernel/cpu/tsx.c
+++ b/arch/x86/kernel/cpu/tsx.c
@@ -58,7 +58,7 @@ void tsx_enable(void)
wrmsrl(MSR_IA32_TSX_CTRL, tsx);
}

-static bool __init tsx_ctrl_is_supported(void)
+static bool tsx_ctrl_is_supported(void)
{
u64 ia32_cap = x86_read_arch_cap_msr();

@@ -84,6 +84,44 @@ static enum tsx_ctrl_states x86_get_tsx_auto_mode(void)
return TSX_CTRL_ENABLE;
}

+/*
+ * Disabling TSX is not a trivial business.
+ *
+ * First of all, there's a CPUID bit: X86_FEATURE_RTM_ALWAYS_ABORT
+ * which says that TSX is practically disabled (all transactions are
+ * aborted by default). When that bit is set, the kernel unconditionally
+ * disables TSX.
+ *
+ * In order to do that, however, it needs to dance a bit:
+ *
+ * 1. The first method to disable it is through MSR_TSX_FORCE_ABORT and
+ * the MSR is present only when *two* CPUID bits are set:
+ *
+ * - X86_FEATURE_RTM_ALWAYS_ABORT
+ * - X86_FEATURE_TSX_FORCE_ABORT
+ *
+ * 2. The second method is for CPUs which do not have the above-mentioned
+ * MSR: those use a different MSR - MSR_IA32_TSX_CTRL and disable TSX
+ * through that one. Those CPUs can also have the initially mentioned
+ * CPUID bit X86_FEATURE_RTM_ALWAYS_ABORT set and for those the same strategy
+ * applies: TSX gets disabled unconditionally.
+ *
+ * When either of the two methods are present, the kernel disables TSX and
+ * clears the respective RTM and HLE feature flags.
+ *
+ * An additional twist in the whole thing presents late microcode loading
+ * which, when done, may cause for the X86_FEATURE_RTM_ALWAYS_ABORT CPUID
+ * bit to be set after the update.
+ *
+ * A subsequent hotplug operation on any logical CPU except the BSP will
+ * cause for the supported CPUID feature bits to get re-detected and, if
+ * RTM and HLE get cleared all of a sudden, but, userspace did consult
+ * them before the update, then funny explosions will happen. Long story
+ * short: the kernel doesn't modify CPUID feature bits after booting.
+ *
+ * That's why, this function's call in init_intel() doesn't clear the
+ * feature flags.
+ */
void tsx_clear_cpuid(void)
{
u64 msr;
@@ -97,6 +135,10 @@ void tsx_clear_cpuid(void)
rdmsrl(MSR_TSX_FORCE_ABORT, msr);
msr |= MSR_TFA_TSX_CPUID_CLEAR;
wrmsrl(MSR_TSX_FORCE_ABORT, msr);
+ } else if (tsx_ctrl_is_supported()) {
+ rdmsrl(MSR_IA32_TSX_CTRL, msr);
+ msr |= TSX_CTRL_CPUID_CLEAR;
+ wrmsrl(MSR_IA32_TSX_CTRL, msr);
}
}

@@ -106,13 +148,11 @@ void __init tsx_init(void)
int ret;

/*
- * Hardware will always abort a TSX transaction if both CPUID bits
- * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is
- * better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
- * here.
+ * Hardware will always abort a TSX transaction when CPUID
+ * RTM_ALWAYS_ABORT is set. In this case, it is better not to enumerate
+ * CPUID.RTM and CPUID.HLE bits. Clear them here.
*/
- if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
- boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+ if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) {
tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
tsx_clear_cpuid();
setup_clear_cpu_cap(X86_FEATURE_RTM);
--
2.29.2

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette