Re: [PATCH] KVM: x86: Ignore MSR_AMD64_BU_CFG access

From: Maciej S. Szmigiero
Date: Mon Sep 25 2023 - 14:53:54 EST


On 25.09.2023 20:30, Sean Christopherson wrote:
On Mon, Sep 25, 2023, Maciej S. Szmigiero wrote:
From: "Maciej S. Szmigiero" <maciej.szmigiero@xxxxxxxxxx>

Hyper-V enabled Windows Server 2022 KVM VM cannot be started on Zen1 Ryzen
since it crashes at boot with SYSTEM_THREAD_EXCEPTION_NOT_HANDLED +
STATUS_PRIVILEGED_INSTRUCTION (in other words, because of an unexpected #GP
in the guest kernel).

This is because Windows tries to set bit 8 in MSR_AMD64_BU_CFG and can't
handle receiving a #GP when doing so.

Any idea why?

I guess it is trying to set some chicken bit?

By the way, I tested Windows Server 2019 now - it has the same problem.

So likely Windows 11 and newer version of Windows 10 have it, too.

Give this MSR the same treatment that commit 2e32b7190641
("x86, kvm: Add MSR_AMD64_BU_CFG2 to the list of ignored MSRs") gave
MSR_AMD64_BU_CFG2 under justification that this MSR is baremetal-relevant
only.

Ugh, that commit set a terrible example. The kernel change should have been
conditioned on !X86_FEATURE_HYPERVISOR if the MSR only has meaning for bare metal.

You are right with respect to the original guest kernel change that
triggered the later KVM commit ignoring MSR_AMD64_BU_CFG2.

This doesn't help Windows guests, however.

Although apparently it was then needed for Linux guests, not Windows as in
this case.

With this change, the aforementioned guest setup is able to finish booting
successfully.

This issue can be reproduced either on a Summit Ridge Ryzen (with
just "-cpu host") or on a Naples EPYC (with "-cpu host,stepping=1" since
EPYC is ordinarily stepping 2).

This seems like it needs to be tagged for stable?

Like with just "Cc: stable@xxxxxxxxxxxxxxx", but without "Fixes:" tag?
Can do.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@xxxxxxxxxx>
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kvm/x86.c | 2 ++
2 files changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 1d111350197f..c80a5cea80c4 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -553,6 +553,7 @@
#define MSR_AMD64_CPUID_FN_1 0xc0011004
#define MSR_AMD64_LS_CFG 0xc0011020
#define MSR_AMD64_DC_CFG 0xc0011022
+#define MSR_AMD64_BU_CFG 0xc0011023

What document actually defines this MSR? All of the PPRs I can find for Family 17h
list it as:

MSRC001_1023 [Table Walker Configuration] (Core::X86::Msr::TW_CFG)

It's partially documented in various AMD BKDGs, however I couldn't find
any definition for this particular bit (8) - other than that it is reserved.

#define MSR_AMD64_DE_CFG 0xc0011029
#define MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT 1
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9f18b06bbda6..2f3cdd798185 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3639,6 +3639,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_UCODE_WRITE:
case MSR_VM_HSAVE_PA:
case MSR_AMD64_PATCH_LOADER:
+ case MSR_AMD64_BU_CFG:

I am sorely tempted to say that this should be solved in userspace via MSR
filtering. IIUC, the MSR truly is model specific, and I don't love the idea of
effectively ignoring accesses to unknown MSRs. And I really, really don't want
KVM to pivot on FMS.

Paolo, is punting to userspace reasonable, or should we just bite the bullet in
KVM and commit to ignoring MSRs like this?


Waiting for Paolo's decision here then.

Thanks,
Maciej