Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs

From: Sean Christopherson
Date: Wed Jul 12 2023 - 12:43:13 EST


On Fri, Jul 07, 2023, Weijiang Yang wrote:
> > Side topic, what on earth does the SDM mean by this?!?
> >
> > The linear address written must be aligned to 8 bytes and bits 2:0 must be 0
> > (hardware requires bits 1:0 to be 0).
> >
> > I know Intel retroactively changed the alignment requirements, but the above
> > is nonsensical. If ucode prevents writing bits 2:0, who cares what hardware
> > requires?
>
> Hi, Sean,
>
> Regarding the alignment check, I got update from Gil:
>
> ==================================================
>
> The WRMSR instruction to load IA32_PL[0-3]_SSP will #GP if the value to be
> loaded sets either bit 0 or bit 1.  It does not check bit 2.
> IDT event delivery, when changing to rings 0-2 will load SSP from the MSR
> corresponding to the new ring.  These transitions check that bits 2:0 of the
> new value are all zero and will generate a nested fault if any of those bits
> are set.  (Far CALL using a call gate also checks this if changing CPL.)
>
> For a VMM that is emulating a WRMSR by a guest OS (because it was
> intercepting writes to that MSR), it suffices to perform the same checks as
> the CPU would (i.e., only bits 1:0):
> •    If the VMM sees bits 1:0 clear, it can perform the write on the part of
> the guest OS.  If the guest OS later encounters a #GP during IDT event
> delivery (because bit 2 is set), it is its own fault.
> •    If the VMM sets either bit 0 or bit 1 set, it should inject a #GP into
> the guest, as that is what the CPU would do in this case.
>
> For an OS that is writing to the MSRs to set up shadow stacks, it should
> WRMSR the base addresses of those stacks.  Because of the token-based
> architecture used for supervisor shadow stacks (for rings 0-2), the base
> addresses of those stacks should be 8-byte aligned (clearing bits 2:0). 
> Thus, the values that an OS writes to the corresponding MSRs should clear
> bits 2:0.
>
> (Of course, most OS’s will use only the MSR for ring 0, as most OS’s do not
> use rings 1 and 2.)
>
> In contrast, the IA32_PL3_SSP MSR holds the current SSP for user software. 
> When a user thread is created, I suppose it may reference the base of the
> user shadow stack.  For a 32-bit app, that needs to be 4-byte aligned (bits
> 1:0 clear); for a 64-bit app, it may be necessary for it to be 8-byte
> aligned (bits 2:0) clear.
>
> Once the user thread is executing, the CPU will load IA32_PL3_SSP with the
> user’s value of SSP on every exception and interrupt to ring 0.  The value
> at that time may be 4-byte or 8-byte aligned, depending on how the user
> thread is using the shadow stack.  On context switches, the OS should WRMSR
> whatever value was saved (by RDMSR) the last time there was a context switch
> away from the incoming thread.  The OS should not need to inspect or change
> this value.
>
> ===================================================
>
> Based on his feedback, I think VMM needs to check bits 1:0 when write the
> SSP MSRs. Is it?

Yep, KVM should only check bits 1:0 when emulating WRMSR. KVM doesn't emulate
event delivery except for Real Mode, and I don't see that ever changing. So to
"handle" the #GP during event delivery case, KVM just needs to propagate the "bad"
value into guest context, which KVM needs to do anyways.

Thanks for following up on this!