[PATCH v3 00/21] Enable CET Virtualization

From: Yang Weijiang
Date: Thu May 11 2023 - 03:13:45 EST


Control-flow Enforcement Technology (CET) is a CPU feature used to prevent
Return/Jump-Oriented Programming (ROP/JOP) attacks. CET introduces a new
exception type, Control Protection (#CP), and two sub-features(SHSTK,IBT)
to defend against ROP/JOP style control-flow subversion attacks.

Shadow Stack (SHSTK):
A shadow stack is a second stack used exclusively for control transfer
operations. The shadow stack is separate from the data/normal stack and
can be enabled individually in user and kernel mode. When shadow stack
is enabled, CALL pushes the return address on both the data and shadow
stack. RET pops the return address from both stacks and compares them.
If the return addresses from the two stacks do not match, the processor
generates a #CP.

Indirect Branch Tracking (IBT):
IBT adds a new instruction, ENDBRANCH, to mark valid target addresses of
indirect branches (CALL, JMP etc...). If an indirect branch is executed
and the next instruction is _not_ an ENDBRANCH, the processor generates a
#CP. These instruction behaves as a NOP on platforms that doesn't support
CET.


Dependency:
--------------------------------------------------------------------------
The first 5 patches are taken over from CET native series [1] in linux-next.
They're prerequisites for enabling guest user mode SHSTK. Patch this full
series before build host kernel for guest CET testing. Also apply CET enabling
patches in [2] to build qualified QEMU. These kernel dependent patches will
be enclosed in KVM series until CET native series is merged in mainline tree.


Implementation:
--------------------------------------------------------------------------
Historically, the early KVM patches can support both user SHSTK and IBT,
and most of the early patches are carried forward with changes in this new
series. And with kernel IBT feature merged in 5.18, a new patch was added
to support the feature in guest. The last patch is introduced to support
supervisor SHSTK but the feature is not enabled on Intel platform for now,
the main purpose of this patch is to facilitate AMD folks to enable the
feature.

In summary, this new series enables CET user SHSTK/IBT and kernel IBT, but
doesn't fully support CET supervisor SHSTK, the enabling work is left for
the future.

Supported CET sub-features:

|
User SHSTK | User IBT (user mode)
--------------------------------------------------
s-SHSTK (X) | Kernel IBT (kernel mode)
|

Guest user mode SHSTK/IBT relies on host side XSAVES support(XSS[bit 11])
to swap CET states. Guest kernel IBT doesn't have dependency on host XSAVES.
The supervisor SHSTK relies on host side XSAVES support(XSS[bit 12]) for
supervisor mode CET states save/restore.

This version removed unnecessary checks of host CET enabling status before
expose CET features to guest, making guest CET enabling apart from host.
By doing so, it's expected to be more friendly to cloud computing scenarios.


CET states management:
--------------------------------------------------------------------------
CET user mode states, MSR_IA32_{U_CET,PL3_SSP} depends on {XSAVES,XRSTORS}
instructions to swap guest/host context when vm-exit/vm-entry happens.
On vm-exit, the guest CET states are stored to guest fpu area and host user
mode states are loaded from thread/process context before vCPU returns to
userspace, vice-versa on vm-entry. See details in kvm_{load|put}_guest_fpu().
So the user mode state validity depends on host side U_CET bit set in MSR_XSS.

CET supervisor mode states are grouped into two categories - XSAVES dependent
and non-dependent, the former includes MSR_IA32_PL{0,1,2}_SSP, the later
consists of MSR_IA32_S_CET and MSR_IA32_INTR_SSP_TBL. The XSAVES dependent
MSR's save/restore depends on S_CET bit set in MSR_XSS. Since native series
doesn't enable S_CET support, these s-SHSTK shadow stack pointers are invalid.

New VMCS fields, {GUEST|HOST}_{S_CET,SSP,INTR_SSP_TABL}, are introduced for
guest/host non-XSAVES managed states switch. When CET entry/exit load bits are
set, guest/host MSR_IA32_{S_CET,INTR_SSP_TBL,SSP} are loaded from these fields
at vm-exit/entry. With these new fields, current guest kernel IBT enabling
doesn't depend on S_CET bit in XSS, i.e., host {XSAVES|XRSTORS} support.


Tests:
--------------------------------------------------------------------------
This series passed basic CET user shadow stack test and kernel IBT test in
L1 and L2 guest. It also works with CET KVM-unit-test application.

Executed all KVM-unit-test cases and KVM selftests against this series, all
test cases passed except the vmx test, the failure is due to CR4_CET bit
testing in test_vmxon_bad_cr(). After add CR4_CET bit to skip list, the test
passed. I'll send a patch to fix this issue later.


To run user shadow stack test and kernel IBT test in VM, you need an CET
capable platform, e.g., Sapphire Rapids server, and follow below steps to
build host/guest kernel properly:

1. Build host kernel. Patch this series to kernel tree and build kernel.

2. Build guest kernel. Patch CET native series to kernel tree and opt-in
CONFIG_X86_KERNEL_IBT and CONFIG_X86_USER_SHADOW_STACK options. Build with
CET enabled gcc versions(>= 8.5.0).

3. Use patched QEMU to launch a VM.

Check kernel selftest test_shadow_stack_64 output:

[INFO] new_ssp = 7f8c82100ff8, *new_ssp = 7f8c82101001
[INFO] changing ssp from 7f8c82900ff0 to 7f8c82100ff8
[INFO] ssp is now 7f8c82101000
[OK] Shadow stack pivot
[OK] Shadow stack faults
[INFO] Corrupting shadow stack
[INFO] Generated shadow stack violation successfully
[OK] Shadow stack violation test
[INFO] Gup read -> shstk access success
[INFO] Gup write -> shstk access success
[INFO] Violation from normal write
[INFO] Gup read -> write access success
[INFO] Violation from normal write
[INFO] Gup write -> write access success
[INFO] Cow gup write -> write access success
[OK] Shadow gup test
[INFO] Violation from shstk access
[OK] mprotect() test
[SKIP] Userfaultfd unavailable.
[OK] 32 bit test


Check kernel IBT with dmesg | grep CET:

CET detected: Indirect Branch Tracking enabled

--------------------------------------------------------------------------
Changes in v3:
1. Moved MSR access check helper to x86 common file. [Mike]
2. Modified cover letter, commit logs and code per review comments. [PeterZ, Binbin, Rick]
3. Fixed an issue on host MSR_IA32_S_CET reload at vm-exit.
5. Rebase on kvm-x86/next [4].


[1]: linux-next: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/?h=next-20230420
[2]: QEMU patch: https://lore.kernel.org/all/20230421041227.90915-1-weijiang.yang@xxxxxxxxx/
[3]: v2 patchset: https://lore.kernel.org/all/20230421134615.62539-1-weijiang.yang@xxxxxxxxx/
[4]: Rebase branch: https://github.com/kvm-x86/linux.git, commit: 5c291b93e5d6 (tag: kvm-x86-next-2023.04.26)


Rick Edgecombe (5):
x86/shstk: Add Kconfig option for shadow stack
x86/cpufeatures: Add CPU feature flags for shadow stacks
x86/cpufeatures: Enable CET CR4 bit for shadow stack
x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
x86/fpu: Add helper for modifying xstate

Sean Christopherson (2):
KVM:x86: Report XSS as to-be-saved if there are supported features
KVM:x86: Load guest FPU state when accessing xsaves-managed MSRs

Yang Weijiang (14):
KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
KVM:x86: Init kvm_caps.supported_xss with supported feature bits
KVM:x86: Add #CP support in guest exception classification
KVM:VMX: Introduce CET VMCS fields and control bits
KVM:x86: Add fault checks for guest CR4.CET setting
KVM:VMX: Emulate reads and writes to CET MSRs
KVM:VMX: Add a synthetic MSR to allow userspace to access GUEST_SSP
KVM:x86: Report CET MSRs as to-be-saved if CET is supported
KVM:x86: Save/Restore GUEST_SSP to/from SMM state save area
KVM:VMX: Pass through user CET MSRs to the guest
KVM:x86: Enable CET virtualization for VMX and advertise to userspace
KVM:nVMX: Enable user CET support for nested VMX
KVM:x86: Enable kernel IBT support for guest
KVM:x86: Support CET supervisor shadow stack MSR access

arch/x86/Kconfig | 24 +++++
arch/x86/Kconfig.assembler | 5 +
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/fpu/api.h | 9 ++
arch/x86/include/asm/fpu/types.h | 16 ++-
arch/x86/include/asm/fpu/xstate.h | 6 +-
arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/include/asm/vmx.h | 8 ++
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kernel/cpu/common.c | 35 +++++--
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/fpu/core.c | 19 ++++
arch/x86/kernel/fpu/xstate.c | 90 ++++++++--------
arch/x86/kvm/cpuid.c | 19 +++-
arch/x86/kvm/cpuid.h | 6 ++
arch/x86/kvm/smm.c | 20 ++++
arch/x86/kvm/vmx/capabilities.h | 4 +
arch/x86/kvm/vmx/nested.c | 29 +++++-
arch/x86/kvm/vmx/vmcs12.c | 6 ++
arch/x86/kvm/vmx/vmcs12.h | 14 ++-
arch/x86/kvm/vmx/vmx.c | 124 ++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.h | 6 +-
arch/x86/kvm/x86.c | 122 ++++++++++++++++++++--
arch/x86/kvm/x86.h | 47 ++++++++-
26 files changed, 543 insertions(+), 82 deletions(-)


base-commit: 5c291b93e5d665380dbecc6944973583f9565ee5
--
2.27.0