[patch V4 00/30] x86/microcode: Cleanup and late loading enhancements

From: Thomas Gleixner
Date: Mon Oct 02 2023 - 07:59:41 EST


This is a follow up on:

https://lore.kernel.org/lkml/20230912065249.695681286@xxxxxxxxxxxxx

Late microcode loading is desired by enterprise users. Late loading is
problematic as it requires detailed knowledge about the change and an
analysis whether this change modifies something which is already in use by
the kernel. Large enterprise customers have engineering teams and access to
deep technical vendor support. The regular admin does not have such
resources, so the kernel has always tainted the kernel after late loading.

Intel recently added a new previously reserved field to the microcode
header which contains the minimal microcode revision which must be running
on the CPU to make the load safe. This field is 0 in all older microcode
revisions, which the kernel assumes to be unsafe. Minimal revision checking
can be enforced via Kconfig or kernel command line. It then refuses to load
an unsafe revision. The default loads unsafe revisions like before and
taints the kernel. If a safe revision is loaded the kernel is not tainted.

But that does not solve all other known problems with late loading:

- Late loading on current Intel CPUs is unsafe vs. NMI when
hyperthreading is enabled. If a NMI hits the secondary sibling while
the primary loads the microcode, the machine can crash.

- Soft offline SMT siblings which are playing dead with MWAIT can cause
damage too when the microcode update modifies MWAIT. That's a
realistic scenario in the context of 'nosmt' mitigations. :(

Neither the core code nor the Intel specific code handles any of this at all.

While trying to implement this, I stumbled over disfunctional, horribly
complex and redundant code, which I decided to clean up first so the new
functionality can be added on a clean slate.

So the series has several sections:

1) Move the 32bit early loading after paging enable

2) Cleanup of the Intel specific code

3) Implementation of proper core control logic to handle the NMI safe
requirements

4) Support for minimal revision check in the core and the Intel specific
parts.

Changes vs. V3:

- Rebased on v6.6-rc1

- Remove the early load magic which was required for physical address
mode from the AMD code.

- Address the review comments from Borislav, which is mostly naming,
comments and change logs. No functional changes vs. v3

The series is also available from git:

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git ucode-v4

Thanks,

tglx
---
Documentation/admin-guide/kernel-parameters.txt | 5
arch/x86/Kconfig | 25
arch/x86/include/asm/apic.h | 5
arch/x86/include/asm/cpu.h | 20
arch/x86/include/asm/microcode.h | 19
arch/x86/kernel/Makefile | 1
arch/x86/kernel/apic/apic_flat_64.c | 2
arch/x86/kernel/apic/ipi.c | 8
arch/x86/kernel/apic/x2apic_cluster.c | 1
arch/x86/kernel/apic/x2apic_phys.c | 1
arch/x86/kernel/cpu/common.c | 12
arch/x86/kernel/cpu/microcode/amd.c | 129 +---
arch/x86/kernel/cpu/microcode/core.c | 637 ++++++++++++++--------
arch/x86/kernel/cpu/microcode/intel.c | 682 +++++++-----------------
arch/x86/kernel/cpu/microcode/internal.h | 32 -
arch/x86/kernel/head32.c | 6
arch/x86/kernel/head_32.S | 10
arch/x86/kernel/nmi.c | 9
arch/x86/kernel/smpboot.c | 12
drivers/platform/x86/intel/ifs/load.c | 8
include/linux/cpuhotplug.h | 1
21 files changed, 788 insertions(+), 837 deletions(-)