Re: [PATCH v4 20/21] x86/efistub: Perform SNP feature test while running in the firmware

From: Tom Lendacky
Date: Fri Jun 02 2023 - 18:01:34 EST


On 6/2/23 16:29, Ard Biesheuvel wrote:
On Fri, 2 Jun 2023 at 22:39, Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:

On 6/2/23 15:38, Tom Lendacky wrote:
On 6/2/23 05:13, Ard Biesheuvel wrote:
Before refactoring the EFI stub boot flow to avoid the legacy bare metal
decompressor, duplicate the SNP feature check in the EFI stub before
handing over to the kernel proper.

The SNP feature check can be performed while running under the EFI boot
services, which means we can fail gracefully and return an error to the
bootloader if the loaded kernel does not implement support for all the
features that the hypervisor enabled.

Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
---
arch/x86/boot/compressed/sev.c | 74 ++++++++++++--------
arch/x86/include/asm/sev.h | 4 ++
drivers/firmware/efi/libstub/x86-stub.c | 17 +++++
3 files changed, 67 insertions(+), 28 deletions(-)

diff --git a/arch/x86/boot/compressed/sev.c
b/arch/x86/boot/compressed/sev.c
index 014b89c890887b9a..be021e24f1ece421 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c


+void sev_enable(struct boot_params *bp)
+{
+ unsigned int eax, ebx, ecx, edx;
bool snp;
/*
@@ -358,37 +391,14 @@ void sev_enable(struct boot_params *bp)
*/
snp = snp_init(bp);
- /* Check for the SME/SEV support leaf */
- eax = 0x80000000;
- ecx = 0;
- native_cpuid(&eax, &ebx, &ecx, &edx);
- if (eax < 0x8000001f)
- return;
-
- /*
- * Check for the SME/SEV feature:
- * CPUID Fn8000_001F[EAX]
- * - Bit 0 - Secure Memory Encryption support
- * - Bit 1 - Secure Encrypted Virtualization support
- * CPUID Fn8000_001F[EBX]
- * - Bits 5:0 - Pagetable bit position used to indicate encryption
- */
- eax = 0x8000001f;
- ecx = 0;
- native_cpuid(&eax, &ebx, &ecx, &edx);
- /* Check whether SEV is supported */
- if (!(eax & BIT(1))) {
+ /* Set the SME mask if this is an SEV guest. */
+ sev_status = sev_get_status();
+ if (!(sev_status & MSR_AMD64_SEV_ENABLED)) {
if (snp)
error("SEV-SNP support indicated by CC blob, but not
CPUID.");
return;
}
- /* Set the SME mask if this is an SEV guest. */
- boot_rdmsr(MSR_AMD64_SEV, &m);
- sev_status = m.q;
- if (!(sev_status & MSR_AMD64_SEV_ENABLED))
- return;
-
/* Negotiate the GHCB protocol version. */
if (sev_status & MSR_AMD64_SEV_ES_ENABLED) {
if (!sev_es_negotiate_protocol())
@@ -409,6 +419,14 @@ void sev_enable(struct boot_params *bp)
if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
error("SEV-SNP supported indicated by CC blob, but not SEV
status MSR.");
+ /*
+ * Check for the SME/SEV feature:
+ * CPUID Fn8000_001F[EBX]
+ * - Bits 5:0 - Pagetable bit position used to indicate encryption
+ */
+ eax = 0x8000001f;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);

This causes SEV-ES / SEV-SNP to crash.

This goes back to a previous comment where calling either
sev_es_negotiate_protocol() or get_hv_features() blows away the GHCB value
in the GHCB MSR and as soon as the CPUID instruction is executed the boot
blows up.

Even if we move this up to be done earlier, we can complete this function
successfully but then blow up further on.

So you probably have to modify the routines in question to save and
restore the GHCB MSR value.

I should clarify that it doesn't in fact cause a problem until the final
patch is applied and this path is taken.


Could we just move the CPUID call to the start of the function?

I tried that and it allowed sev_enable() to complete successfully, but then it blew up after that.

But I noticed that the patch to apply the kernel CS earlier is no longer part of this series. When I applied the patch to move the setting of the kernel CS directly after the call to startup_64_setup_env() in arch/x86/kernel/head_64.S, everything worked again.

That patch hasn't been pulled into tip, yet, and since you dropped your version, the CS value wrong and the IRET from the #VC blows up.

So as long as you pre-req that patch, unless Boris or Dave would prefer it to go in with this series, moving the call to native_cpuid() up to just before the GHCB protocol version negotiation, works.

Thanks,
Tom