Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs

From: Yang, Weijiang
Date: Fri Jul 07 2023 - 05:10:44 EST



+ case MSR_IA32_PL3_SSP:
+ if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+ return 1;
+ if (is_noncanonical_address(data, vcpu))
+ return 1;
+ if (msr_index == MSR_IA32_U_CET && (data & GENMASK(9, 6)))
+ return 1;
+ if (msr_index == MSR_IA32_PL3_SSP && (data & GENMASK(2, 0)))
Please #define reserved bits, ideally using the inverse of the valid masks. And
for SSP, it might be better to do IS_ALIGNED(data, 8) (or 4, pending my question
about the SDM's wording).

Side topic, what on earth does the SDM mean by this?!?

The linear address written must be aligned to 8 bytes and bits 2:0 must be 0
(hardware requires bits 1:0 to be 0).

I know Intel retroactively changed the alignment requirements, but the above
is nonsensical. If ucode prevents writing bits 2:0, who cares what hardware
requires?

Hi, Sean,

Regarding the alignment check, I got update from Gil:

==================================================

The WRMSR instruction to load IA32_PL[0-3]_SSP will #GP if the value to be loaded sets either bit 0 or bit 1.  It does not check bit 2.
IDT event delivery, when changing to rings 0-2 will load SSP from the MSR corresponding to the new ring.  These transitions check that bits 2:0 of the new value are all zero and will generate a nested fault if any of those bits are set.  (Far CALL using a call gate also checks this if changing CPL.)

For a VMM that is emulating a WRMSR by a guest OS (because it was intercepting writes to that MSR), it suffices to perform the same checks as the CPU would (i.e., only bits 1:0):
•    If the VMM sees bits 1:0 clear, it can perform the write on the part of the guest OS.  If the guest OS later encounters a #GP during IDT event delivery (because bit 2 is set), it is its own fault.
•    If the VMM sets either bit 0 or bit 1 set, it should inject a #GP into the guest, as that is what the CPU would do in this case.

For an OS that is writing to the MSRs to set up shadow stacks, it should WRMSR the base addresses of those stacks.  Because of the token-based architecture used for supervisor shadow stacks (for rings 0-2), the base addresses of those stacks should be 8-byte aligned (clearing bits 2:0).  Thus, the values that an OS writes to the corresponding MSRs should clear bits 2:0.

(Of course, most OS’s will use only the MSR for ring 0, as most OS’s do not use rings 1 and 2.)

In contrast, the IA32_PL3_SSP MSR holds the current SSP for user software.  When a user thread is created, I suppose it may reference the base of the user shadow stack.  For a 32-bit app, that needs to be 4-byte aligned (bits 1:0 clear); for a 64-bit app, it may be necessary for it to be 8-byte aligned (bits 2:0) clear.

Once the user thread is executing, the CPU will load IA32_PL3_SSP with the user’s value of SSP on every exception and interrupt to ring 0.  The value at that time may be 4-byte or 8-byte aligned, depending on how the user thread is using the shadow stack.  On context switches, the OS should WRMSR whatever value was saved (by RDMSR) the last time there was a context switch away from the incoming thread.  The OS should not need to inspect or change this value.

===================================================

Based on his feedback, I think VMM needs to check bits 1:0 when write the SSP MSRs. Is it?