Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed

From: Dave Hansen
Date: Thu Aug 10 2023 - 10:29:51 EST


On 8/10/23 02:29, Yang, Weijiang wrote:
...
> When KVM enumerates shadow stack support for guest in CPUID(0x7,
> 0).ECX[bit7], architecturally it claims both SS user and supervisor
> mode are supported. Although the latter is not supported in Linux,
> but in virtualization world, the guest OS could be non-Linux system,
> so KVM supervisor state support is necessary in this case.

What actual OSes need this support?

> Two solutions are on the table:
> 1) Enable CET supervisor support in Linux kernel like user mode support.

We _will_ do this eventually, but not until FRED is merged. The core
kernel also probably won't be managing the MSRs on non-FRED hardware.

I think what you're really talking about here is that the kernel would
enable CET_S XSAVE state management so that CET_S state could be managed
by the core kernel's FPU code.

That is, frankly, *NOT* like the user mode support at all.

> 2) Enable support in KVM domain.
>
> Problem:
> The Pros/Cons for each solution(my individual thoughts):
> In kernel solution:
> Pros:
> - Avoid saving/restoring 3 supervisor MSRs(PL{0,1,2}_SSP) at vCPU
> execution path.
> - Easy for KVM to manage guest CET xstate bits for guest.
> Cons:
> - Unnecessary supervisor state xsaves/xrstors operation for non-vCPU
> thread.

What operations would be unnecessary exactly?

> - Potentially extra storage space(24 bytes) for thread context.

Yep. This one is pretty unavoidable. But, we've kept MPX around in
this state for a looooooong time and nobody really seemed to care.

> KVM solution:
> Pros:
> - Not touch current kernel FPU management framework and logic.
> - No extra space and operation for non-vCPU thread.
> Cons:
> - Manually saving/restoring 3 supervisor MSRs is a performance burden to
> KVM.
> - It looks more like a hack method for KVM, and some handling logic
> seems a bit awkward.

In a perfect world, we'd just allocate space for CET_S in the KVM
fpstates. The core kernel fpstates would have
XSTATE_BV[13]==XCOMP_BV[13]==0. An XRSTOR of the core kernel fpstates
would just set CET_S to its init state.

But I suspect that would be too much work to implement in practice. It
would be akin to a new lesser kind of dynamic xstate, one that didn't
interact with XFD and *NEVER* gets allocated in the core kernel
fpstates, even on demand.

I want to hear more about who is going to use CET_S state under KVM in
practice. I don't want to touch it if this is some kind of purely
academic exercise. But it's also silly to hack some kind of temporary
solution into KVM that we'll rip out in a year when real supervisor
shadow stack support comes along.

If it's actually necessary, we should probably just eat the 24 bytes in
the fpstates, flip the bit in IA32_XSS and move on. There shouldn't be
any other meaningful impact to the core kernel.