Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES

From: Sean Christopherson
Date: Mon May 22 2023 - 20:25:07 EST


On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
> Prior to SEV-ES, KVM saved/restored host debug registers upon switching
> to/from a VM. Changing those registers inside a running SEV VM
> triggered #VMEXIT to KVM.

Please, please don't make it sound like some behavior is *the* one and only behavior.
*KVM* *chooses* to intercept debug register accesses. Though I would omit this
paragraph (and largely the next) entirely, IMO it's safe to assume the reader has
a basic understanding of how KVM deals with DRs and #DBs.

> SEV-ES added encrypted state (ES) which uses an encrypted page
> for the VM state (VMSA). The hardware saves/restores certain registers
> on VMRUN/VMEXIT according to a swap type (A, B, C), see
> "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
> volume 2.
>
> The DR6 and DR7 registers have always been swapped as Type A for SEV-ES

Please rewrite this to just state what the behavior is instead of "Type A" vs.
"Type B". Outside of AMD, the "type a/b/c" stuff isn't anywhere near ubiquitous
enough to justify obfuscating super simple concepts.

Actually, this feature really has nothing to do with Type A vs. Type B, since
that's purely about host state. I assume the switch from Type A to Type B is a
side effect, or an opportunistic optimization?

Regardless, something like this is a lot easier for a human to read. It's easy
enough to find DebugSwap in the APM (literally took me longer to find my copy of
the PDF).

Add support for "DebugSwap for SEV-ES guests", which provides support
for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
allows KVM to expose debug capabilities to SEV-ES guests. Without
DebugSwap support, the CPU doesn't save/load _guest_ debug registers,
and KVM cannot manually context switch guest DRs due the VMSA being
encrypted.

Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
vectoring a #DB. Without NoNestedDataBp, a malicious guest can DoS
the host by putting the CPU into an infinite loop of vectoring #DBs
(see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)

Set the features bit in sev_es_sync_vmsa() which is the last point
when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
init happens) is called not only when VCPU is initialized but also on
intrahost migration when VMSA is encrypted.

> guests, but a new feature is available, identified via
> CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
> support for swapping additional debug registers. DR[0-3] and
> DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
> is set.
>
> Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
> ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
> supported by the SOC as otherwise a malicious SEV-ES guest can set up
> data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
> Set the features bit in sev_es_sync_vmsa() which is the last point
> when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
> init happens) is called not only when VCPU is initialized but also on
> intrahost migration when VMSA is encrypted.
>
> Eliminate DR7 and #DB intercepts as:
> - they are not needed when DebugSwap is supported;

"not needed" isn't sufficient justification. KVM doesn't strictly need to do a
lot of things, but does them anyways for a variety of reasons. E.g. #DB intercept
is also not needed when NoNestedDataBp is supported and vcpu->guest_debug==0, i.e.
this should clarify why KVM doesn't simply disable #DB intercepts for _all_ VMs
when NoNestedDataBp is support. Presumably the answer is "because it's simpler
than toggling #DB interception for guest_debug.

Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
separate patch? KVM can still inject #DBs for SEV-ES guests, no?

As for DR7, the real justification is that, as above, KVM can't modify guest DR7,
and intercepting DR7 would completely defeat the purpose of enabling DebugSwap.

> - #VC for these intercepts is most likely not supported anyway and
> kills the VM.

I don't see how this is relevant or helpful. What the guest may or may not do in
response to a #VC on a DR7 write has no bearing on KVM's behavior.

> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxx>
> Reviewed-by: Tom Lendacky <thomas.lendacky@xxxxxxx>
> ---

...

> @@ -3048,6 +3066,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
>
> /* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
> hostsa->xss = host_xss;
> +
> + /* The DebugSwap SEV feature does Type B swaps of DR[0-3] */

Since dangling a carrot didn't work[*], I'm going to resort to extortion :-)
Can you fold the below somewhere before this patch, and then tweak this comment
to say something like:

/*
* If DebugSwap is enabled, debug registers are loaded but NOT saved by
* the CPU (Type-B). If DebugSwap is disabled/unsupported, the CPU both
* saves and loads debug registers (Type-A).
*/

[*] https://lore.kernel.org/all/YzOTOzUzKPQSqURo@xxxxxxxxxx/

--
From: Sean Christopherson <seanjc@xxxxxxxxxx>
Date: Mon, 22 May 2023 16:29:52 -0700
Subject: [PATCH] KVM: SVM: Rewrite sev_es_prepare_switch_to_guest()'s comment
about swap types

Rewrite the comment(s) in sev_es_prepare_switch_to_guest() to explain the
swap types employed by the CPU for SEV-ES guests, i.e. to explain why KVM
needs to save a seemingly random subset of host state, and to provide a
decoder for the APM's Type-A/B/C terminology.

Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
arch/x86/kvm/svm/sev.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 69ae5e1b3120..1e0e9b08e491 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3017,19 +3017,24 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
{
/*
- * As an SEV-ES guest, hardware will restore the host state on VMEXIT,
- * of which one step is to perform a VMLOAD. KVM performs the
- * corresponding VMSAVE in svm_prepare_guest_switch for both
- * traditional and SEV-ES guests.
+ * All host state for SEV-ES guests is categorized into three swap types
+ * based on how it is handled by hardware during a world switch:
+ *
+ * A: VMRUN: Host state saved in host save area
+ * VMEXIT: Host state loaded from host save area
+ *
+ * B: VMRUN: Host state _NOT_ saved in host save area
+ * VMEXIT: Host state loaded from host save area
+ *
+ * C: VMRUN: Host state _NOT_ saved in host save area
+ * VMEXIT: Host state initialized to default(reset) values
+ *
+ * Manually save type-B state, i.e. state that is loaded by VMEXIT but
+ * isn't saved by VMRUN, that isn't already saved by VMSAVE (performed
+ * by common SVM code).
*/
-
- /* XCR0 is restored on VMEXIT, save the current host value */
hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
-
- /* PKRU is restored on VMEXIT, save the current host value */
hostsa->pkru = read_pkru();
-
- /* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
hostsa->xss = host_xss;
}


base-commit: 39428f6ea9eace95011681628717062ff7f5eb5f
--