Re: [PATCH v3 00/21] Enable CET Virtualization

From: Peter Zijlstra
Date: Thu Jul 20 2023 - 04:04:43 EST


On Thu, Jul 20, 2023 at 07:26:04AM +0200, Pankaj Gupta wrote:
> > > My understanding is that PL[0-2]_SSP are used only on transitions to the
> > > corresponding privilege level from a *different* privilege level. That means
> > > KVM should be able to utilize the user_return_msr framework to load the host
> > > values. Though if Linux ever supports SSS, I'm guessing the core kernel will
> > > have some sort of mechanism to defer loading MSR_IA32_PL0_SSP until an exit to
> > > userspace, e.g. to avoid having to write PL0_SSP, which will presumably be
> > > per-task, on every context switch.
> > >
> > > But note my original wording: **If that's necessary**
> > >
> > > If nothing in the host ever consumes those MSRs, i.e. if SSS is NOT enabled in
> > > IA32_S_CET, then running host stuff with guest values should be ok. KVM only
> > > needs to guarantee that it doesn't leak values between guests. But that should
> > > Just Work, e.g. KVM should load the new vCPU's values if SHSTK is exposed to the
> > > guest, and intercept (to inject #GP) if SHSTK is not exposed to the guest.
> > >
> > > And regardless of what the mechanism ends up managing SSP MSRs, it should only
> > > ever touch PL0_SSP, because Linux never runs anything at CPL1 or CPL2, i.e. will
> > > never consume PL{1,2}_SSP.
> >
> > To clarify, Linux will only use SSS in FRED mode -- FRED removes CPL1,2.
>
> Trying to understand more what prevents SSS to enable in pre FRED, Is
> it better #CP exception
> handling with other nested exceptions?

SSS took the syscall gap and made it worse -- as in *way* worse.

To top it off, the whole SSS busy bit thing is fundamentally
incompatible with how we manage to survive nested exceptions in NMI
context.

Basically, the whole x86 exception / stack switching logic was already
borderline impossible (consider taking an MCE in the early NMI path
where we set up, but have not finished, the re-entrancy stuff), and
pushed it over the edge and set it on fire.

And NMI isn't the only problem, the various new virt exceptions #VC and
#HV are on their own already near impossible, adding SSS again pushes
the whole thing into clear insanity.

There's a good exposition of the whole trainwreck by Andrew here:

https://www.youtube.com/watch?v=qcORS8CN0ow

(that is, sorry for the youtube link, but Google is failing me in
finding the actual Google Doc that talk is based on, or even the slide
deck :/)



FRED solves all that by:

- removing the stack gap, cc/ip/ss/sp/ssp/gs will all be switched
atomically and consistently for every transition.

- removing the non-reentrant IST mechanism and replacing it with stack
levels

- adding an explicit NMI latch

- re-organising the actual shadow stacks and doing away with that busy
bit thing (I need to re-read the FRED spec on this detail again).



Crazy as we are, we're not touching legacy/IDT SSS with a ten foot pole,
sorry.