Re: [PATCH v8 00/26] Enable CET Virtualization

From: Sean Christopherson
Date: Thu Jan 04 2024 - 19:22:58 EST


On Thu, Jan 04, 2024, Rick P Edgecombe wrote:
> On Thu, 2024-01-04 at 15:11 +0800, Yang, Weijiang wrote:
> > > What is the design around CET and the KVM emulator?
> >
> > KVM doesn't emulate CET HW behavior for guest CET, instead it leaves CET
> > related checks and handling in guest kernel. E.g., if emulated JMP/CALL in
> > emulator triggers mismatch of data stack and shadow stack contents, #CP is
> > generated in non-root mode instead of being injected by KVM.  KVM only
> > emulates basic x86 HW behaviors, e.g., call/jmp/ret/in/out etc.
>
> Right. In the case of CET those basic behaviors (call/jmp/ret) now have
> host emulation behavior that doesn't match what guest execution would
> do.

I wouldn't say that KVM emulates "basic" x86. KVM emulates instructions that
BIOS and kernels execute in Big Real Mode (and other "illegal" modes prior to Intel
adding unrestricted guest), instructions that guests commonly use for MMIO, I/O,
and page table modifications, and few other tidbits that have cropped up over the
years.

In other words, as Weijiang suspects below, KVM's emulator handles juuust enough
stuff to squeak by and not barf on real world guests. It is not, and has never
been, anything remotely resembling a fully capable architectural emulator.

> > > My understanding is that the KVM emulator kind of does what it has to
> > > keep things running, and isn't expected to emulate every possible
> > > instruction. With CET though, it is changing the behavior of existing
> > > supported instructions. I could imagine a guest could skip over CET
> > > enforcement by causing an MMIO exit and racing to overwrite the exit-
> > > causing instruction from a different vcpu to be an indirect CALL/RET,
> > > etc.
> >
> > Can you elaborate the case? I cannot figure out how it works.
>
> The point that it should be possible for KVM to emulate call/ret with
> CET enabled. Not saying the specific case is critical, but the one I
> used as an example was that the KVM emulator can (or at least in the
> not too distant past) be forced to emulate arbitrary instructions if
> the guest overwrites the instruction between the exit and the SW fetch
> from the host. 
>
> The steps are:
> vcpu 1 vcpu 2
> -------------------------------------
> mov to mmio addr
> vm exit ept_misconfig
> overwrite mov instruction to call %rax
> host emulator fetches
> host emulates call instruction
>
> So then the guest call operation will skip the endbranch check. But I'm
> not sure that there are not less exotic cases that would run across it.
> I see a bunch of cases where write protected memory kicks to the
> emulator as well. Not sure the exact scenarios and whether this could
> happen naturally in races during live migration, dirty tracking, etc.

It's for shadow paging. Instead of _immediately_ zapping SPTEs on any write to
a shadowed guest PTE, KVM instead tries to emulate the faulting instruction (and
then still zaps SPTE). If KVM can't emulate the instruction for whatever reason,
then KVM will _usually_ just zap the SPTE and resume the guest, i.e. retry the
faulting instruction.

The reason KVM doesn't automatically/unconditionally zap and retry is that there
are circumstances where the guest can't make forward progress, e.g. if an
instruction is using a guest PTE that it is writing, if L2 is modifying L1 PTEs,
and probably a few other edge cases I'm forgetting.

> Again, I'm more just asking the exposure and thinking on it.

If you care about exposure to the emulator from a guest security perspective,
assume that a compromised guest can coerce KVM into attempting to emulate
arbitrary bytes. As in the situation described above, it's not _that_ difficult
to play games with TLBs and instruction vs. data caches.

If all you care about is not breaking misbehaving guests, I wouldn't worry too
much about it.

> > > With reasonable assumptions around the threat model in use by the guest
> > > this is probably not a huge problem. And I guess also reasonable
> > > assumptions about functional expectations, as a misshandled CALL or RET
> > > by the emulator would corrupt the shadow stack.
> >
> > KVM emulates general x86 HW behaviors, if something wrong happens after
> > emulation then it can happen even on bare metal, i.e., guest SW most likely
> > gets wrong somewhere and it's expected to trigger CET exceptions in guest
> > kernel.

No, the days of KVM making shit up from are done. IIUC, you're advocating that
it's ok for KVM to induce a #CP that architecturally should not happen. That is
not acceptable, full stop.

Retrying the instruction in the guest, exiting to userspace, and even terminating
the VM are all perfectly acceptable behaviors if KVM encounters something it can't
*correctly* emulate. But clobbering the shadow stack or not detecting a CFI
violation, even if the guest is misbehaving, is not ok.

> > > But, another thing to do could be to just return X86EMUL_UNHANDLEABLE or
> > > X86EMUL_RETRY_INSTR when CET is active and RET or CALL are emulated.
> >
> > IMHO, translating the CET induced exceptions into X86EMUL_UNHANDLEABLE or
> > X86EMUL_RETRY_INSTR would confuse guest kernel or even VMM, I prefer
> > letting guest kernel handle #CP directly.
>
> Doesn't X86EMUL_RETRY_INSTR kick it back to the guest which is what you
> want? Today it will do the operations without the special CET behavior.
>
> But I do see how this could be tricky to avoid the guest getting stuck
> in a loop with X86EMUL_RETRY_INSTR. I guess the question is if this
> situation is encountered, when KVM can't handle the emulation
> correctly, what should happen? I think usually it returns
> KVM_INTERNAL_ERROR_EMULATION to userspace? So I don't see why the CET
> case is different.
>
> If the scenario (call/ret emulation with CET enabled) doesn't happen,
> how can the guest be confused? If it does happen, won't it be an issue?
>
> > > And I guess also for all instructions if the TRACKER bit is set. It
> > > might tie up that loose end without too much trouble.
> > >
> > > Anyway, was there a conscious decision to just punt on CET enforcement in
> > > the emulator?
> >
> > I don't remember we ever discussed it in community, but since KVM
> > maintainers reviewed the CET virtualization series for a long time, I
> > assume we're moving on the right way :-)
>
> It seems like kind of leap that if it never came up that they must be
> approving of the specific detail. Don't know. Maybe they will chime in.

Yeah, I don't even know what the TRACKER bit does (I don't feel like reading the
SDM right now), let alone if what KVM does or doesn't do in response is remotely
correct.

For CALL/RET (and presumably any branch instructions with IBT?) other instructions
that are directly affected by CET, the simplest thing would probably be to disable
those in KVM's emulator if shadow stacks and/or IBT are enabled, and let KVM's
failure paths take it from there.

Then, *if* a use case comes along where the guest is utilizing CET and "needs"
KVM to emulate affected instructions, we can add the necessary support the emulator.

Alternatively, if teaching KVM's emulator to play nice with shadow stacks and IBT
is easy-ish, just do that.