Re: [PATCH 05/35] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states

From: Dave Hansen
Date: Mon Feb 07 2022 - 20:05:14 EST


On 1/30/22 13:18, Rick Edgecombe wrote:
> From: Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx>
>
> Control-flow Enforcement Technology (CET) introduces these MSRs:
>
> MSR_IA32_U_CET (user-mode CET settings),
> MSR_IA32_PL3_SSP (user-mode shadow stack pointer),
>
> MSR_IA32_PL0_SSP (kernel-mode shadow stack pointer),
> MSR_IA32_PL1_SSP (Privilege Level 1 shadow stack pointer),
> MSR_IA32_PL2_SSP (Privilege Level 2 shadow stack pointer),
> MSR_IA32_S_CET (kernel-mode CET settings),
> MSR_IA32_INT_SSP_TAB (exception shadow stack table).

To be honest, I'm not sure this is very valuable. It's *VERY* close to
the exact information in the structure definitions. It's also not
obviously related to XSAVE. It's more of the "what" this patch does
than the "why". Good changelogs talk about "why".

> The two user-mode MSRs belong to XFEATURE_CET_USER. The first three of
> kernel-mode MSRs belong to XFEATURE_CET_KERNEL. Both XSAVES states are
> supervisor states. This means that there is no direct, unprivileged access
> to these states, making it harder for an attacker to subvert CET.

Forgive me while I go into changelog lecture mode for a moment.

I was constantly looking up at the list of MSRs and trying to reconcile
them with this paragraph. Imagine if you had started out this changelog
by saying:

Shadow stack register state can be managed with XSAVE. The
registers can logically be separated into two groups:

* Registers controlling user-mode operation
* Registers controlling kernel-mode operation

The architecture has two new XSAVE state components: one for
each group of registers. This _lets_ an OS manage them
separately if it chooses. Linux chooses to ... <explain the
design choice here, or why we don't care yet>.

Both XSAVE state components are supervisor states, even the
state controlling user-mode operation. This is a departure from
earlier features like protection keys where the PKRU state is
a normal user (non-supervisor) state. Having the user state be
supervisor-managed ensures there is no direct, unprivileged
access to it, making it harder for an attacker to subvert CET.

Also, IBT gunk is in here too, right? Let's at least *mention* that in
the changelog.

...
> /* All supervisor states including supported and unsupported states. */
> #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 3faf0f97edb1..0ee77ce4c753 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -362,6 +362,26 @@
>
>
> #define MSR_CORE_PERF_LIMIT_REASONS 0x00000690
> +
> +/* Control-flow Enforcement Technology MSRs */
> +#define MSR_IA32_U_CET 0x000006a0 /* user mode cet setting */
> +#define MSR_IA32_S_CET 0x000006a2 /* kernel mode cet setting */
> +#define CET_SHSTK_EN BIT_ULL(0)
> +#define CET_WRSS_EN BIT_ULL(1)
> +#define CET_ENDBR_EN BIT_ULL(2)
> +#define CET_LEG_IW_EN BIT_ULL(3)
> +#define CET_NO_TRACK_EN BIT_ULL(4)
> +#define CET_SUPPRESS_DISABLE BIT_ULL(5)
> +#define CET_RESERVED (BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9))

Would GENMASK_ULL() look any nicer here? I guess it's pretty clear
as-is that bits 6->9 are reserved.

> +#define CET_SUPPRESS BIT_ULL(10)
> +#define CET_WAIT_ENDBR BIT_ULL(11)

Are those bit fields common for both registers? It might be worth a
comment to mention that.

> +#define MSR_IA32_PL0_SSP 0x000006a4 /* kernel shadow stack pointer */
> +#define MSR_IA32_PL1_SSP 0x000006a5 /* ring-1 shadow stack pointer */
> +#define MSR_IA32_PL2_SSP 0x000006a6 /* ring-2 shadow stack pointer */

Are PL1/2 ever used in this implementation? If not, let's axe these
definitions.

> +#define MSR_IA32_PL3_SSP 0x000006a7 /* user shadow stack pointer */
> +#define MSR_IA32_INT_SSP_TAB 0x000006a8 /* exception shadow stack table */
> +
> #define MSR_GFX_PERF_LIMIT_REASONS 0x000006B0
> #define MSR_RING_PERF_LIMIT_REASONS 0x000006B1
>
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 02b3ddaf4f75..44397202762b 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -50,6 +50,8 @@ static const char *xfeature_names[] =
> "Processor Trace (unused)" ,
> "Protection Keys User registers",
> "PASID state",
> + "Control-flow User registers" ,
> + "Control-flow Kernel registers" ,
> "unknown xstate feature" ,
> "unknown xstate feature" ,
> "unknown xstate feature" ,
> @@ -73,6 +75,8 @@ static unsigned short xsave_cpuid_features[] __initdata = {
> [XFEATURE_PT_UNIMPLEMENTED_SO_FAR] = X86_FEATURE_INTEL_PT,
> [XFEATURE_PKRU] = X86_FEATURE_PKU,
> [XFEATURE_PASID] = X86_FEATURE_ENQCMD,
> + [XFEATURE_CET_USER] = X86_FEATURE_SHSTK,
> + [XFEATURE_CET_KERNEL] = X86_FEATURE_SHSTK,
> [XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE,
> [XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE,
> };
> @@ -250,6 +254,8 @@ static void __init print_xstate_features(void)
> print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
> print_xstate_feature(XFEATURE_MASK_PKRU);
> print_xstate_feature(XFEATURE_MASK_PASID);
> + print_xstate_feature(XFEATURE_MASK_CET_USER);
> + print_xstate_feature(XFEATURE_MASK_CET_KERNEL);
> print_xstate_feature(XFEATURE_MASK_XTILE_CFG);
> print_xstate_feature(XFEATURE_MASK_XTILE_DATA);
> }
> @@ -405,6 +411,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
> XFEATURE_MASK_BNDREGS | \
> XFEATURE_MASK_BNDCSR | \
> XFEATURE_MASK_PASID | \
> + XFEATURE_MASK_CET_USER | \
> XFEATURE_MASK_XTILE)
>
> /*
> @@ -621,6 +628,8 @@ static bool __init check_xstate_against_struct(int nr)
> XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state);
> XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state);
> XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg);
> + XCHECK_SZ(sz, nr, XFEATURE_CET_USER, struct cet_user_state);
> + XCHECK_SZ(sz, nr, XFEATURE_CET_KERNEL, struct cet_kernel_state);
>
> /* The tile data size varies between implementations. */
> if (nr == XFEATURE_XTILE_DATA)
> @@ -634,7 +643,9 @@ static bool __init check_xstate_against_struct(int nr)
> if ((nr < XFEATURE_YMM) ||
> (nr >= XFEATURE_MAX) ||
> (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) ||
> - ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) {
> + (nr == XFEATURE_RSRVD_COMP_13) ||
> + (nr == XFEATURE_RSRVD_COMP_14) ||
> + (nr == XFEATURE_RSRVD_COMP_16)) {
> WARN_ONCE(1, "no structure for xstate: %d\n", nr);
> XSTATE_WARN_ON(1);
> return false;

That if() is getting unweildy. While I generally despise macros
implicitly modifying variables, this might be worth it. We could have a
local function variable:

bool feature_checked = false;

and then muck with it in the macro:

#define XCHECK_SZ(sz, nr, nr_macro, __struct) do {
if (nr == nr_macro)) {
feature_checked = true;
if (WARN_ONCE(sz != sizeof(__struct), ... ) {
__xstate_dump_leaves();
}
}
} while (0)

Then the if() just makes sure the feature was checked instead of
checking for reserved features explicitly. We could also do:

bool c = false;

...

c |= XCHECK_SZ(sz, nr, XFEATURE_YMM, struct ymmh_struct);
c |= XCHECK_SZ(sz, nr, XFEATURE_BNDREGS, struct ...
c |= XCHECK_SZ(sz, nr, XFEATURE_BNDCSR, struct ...
...

but that starts to run into 80 columns. Those are both nice because
they mean you don't have to maintain a list of reserved features in the
code. Another option would be to define a:

bool xfeature_is_reserved(int nr)
{
switch (nr) {
case XFEATURE_RSRVD_COMP_13:
...

so the if() looks nicer and won't grow; the function will grow instead.

Either way, I think this needs some refactoring.