Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed

From: Paolo Bonzini
Date: Thu Aug 10 2023 - 11:16:20 EST


On 8/10/23 16:29, Dave Hansen wrote:
On 8/10/23 02:29, Yang, Weijiang wrote:
...
When KVM enumerates shadow stack support for guest in CPUID(0x7,
0).ECX[bit7], architecturally it claims both SS user and supervisor
mode are supported. Although the latter is not supported in Linux,
but in virtualization world, the guest OS could be non-Linux system,
so KVM supervisor state support is necessary in this case.

What actual OSes need this support?

I think Xen could use it when running nested. But KVM cannot expose support for CET in CPUID, and at the same time fake support for MSR_IA32_PL{0,1,2}_SSP (e.g. inject a #GP if it's ever written to a nonzero value).

I suppose we could invent our own paravirtualized CPUID bit for "supervisor IBT works but supervisor SHSTK doesn't". Linux could check that but I don't think it's a good idea.

So... do, or do not. There is no try. :)

Two solutions are on the table:
1) Enable CET supervisor support in Linux kernel like user mode support.

We _will_ do this eventually, but not until FRED is merged. The core
kernel also probably won't be managing the MSRs on non-FRED hardware.

I think what you're really talking about here is that the kernel would
enable CET_S XSAVE state management so that CET_S state could be managed
by the core kernel's FPU code.

Yes, I understand it that way too.

That is, frankly, *NOT* like the user mode support at all.

I agree.

2) Enable support in KVM domain.

Problem:
The Pros/Cons for each solution(my individual thoughts):
In kernel solution:
Pros:
- Avoid saving/restoring 3 supervisor MSRs(PL{0,1,2}_SSP) at vCPU
execution path.
- Easy for KVM to manage guest CET xstate bits for guest.
Cons:
- Unnecessary supervisor state xsaves/xrstors operation for non-vCPU
thread.

What operations would be unnecessary exactly?

Saving/restoring PL0/1/2_SSP when switching from one usermode task's fpstate to another.

KVM solution:
Pros:
- Not touch current kernel FPU management framework and logic.
- No extra space and operation for non-vCPU thread.
Cons:
- Manually saving/restoring 3 supervisor MSRs is a performance burden to
KVM.
- It looks more like a hack method for KVM, and some handling logic
seems a bit awkward.

In a perfect world, we'd just allocate space for CET_S in the KVM
fpstates. The core kernel fpstates would have
XSTATE_BV[13]==XCOMP_BV[13]==0. An XRSTOR of the core kernel fpstates
would just set CET_S to its init state.

Yep. I don't think it's a lot of work to implement. The basic idea as you point out below is something like

#define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA
#define XFEATURE_MASK_USER_OPTIONAL \
(XFEATURE_MASK_DYNAMIC | XFEATURE_MASK_CET_KERNEL)

where XFEATURE_MASK_USER_DYNAMIC is used for xfd-related tasks (including the ARCH_GET_XCOMP_SUPP arch_prctl) but everything else uses XFEATURE_MASK_USER_OPTIONAL.

KVM would enable the feature by hand when allocating the guest fpstate. Disabled features would be cleared from EDX:EAX when calling XSAVE/XSAVEC/XSAVES.

But I suspect that would be too much work to implement in practice. It
would be akin to a new lesser kind of dynamic xstate, one that didn't
interact with XFD and *NEVER* gets allocated in the core kernel
fpstates, even on demand.

I want to hear more about who is going to use CET_S state under KVM in
practice. I don't want to touch it if this is some kind of purely
academic exercise. But it's also silly to hack some kind of temporary
solution into KVM that we'll rip out in a year when real supervisor
shadow stack support comes along.

If it's actually necessary, we should probably just eat the 24 bytes in
the fpstates, flip the bit in IA32_XSS and move on. There shouldn't be
any other meaningful impact to the core kernel.

If that's good to you, why not.

Paolo