Re: [PATCH] KVM: x86: enable TDP MMU by default

From: Paolo Bonzini
Date: Tue Jul 26 2022 - 11:43:38 EST


On 7/26/22 16:57, Stoiko Ivanov wrote:
Hi,

Proxmox[0] recently switched to the 5.15 kernel series (based on the one
for Ubuntu 22.04), which includes this commit.
While it's working well on most installations, we have a few users who
reported that some of their guests shutdown with
`KVM: entry failed, hardware error 0x80000021` being logged under certain
conditions and environments[1]:
* The issue is not deterministically reproducible, and only happens
eventually with certain loads (e.g. we have only one system in our
office which exhibits the issue - and this only by repeatedly installing
Windows 2k22 ~ one out of 10 installs will cause the guest-crash)
* While most reports are referring to (newer) Windows guests, some users
run into the issue with Linux VMs as well
* The affected systems are from a quite wide range - our affected machine
is an old IvyBridge Xeon with outdated BIOS (an equivalent system with
the latest available BIOS is not affected), but we have
reports of all kind of Intel CPUs (up to an i5-12400). It seems AMD CPUs
are not affected.

Disabling tdp_mmu seems to mitigate the issue, but I still thought you
might want to know that in some cases tdp_mmu causes problems, or that you
even might have an idea of how to fix the issue without explicitly
disabling tdp_mmu?

If you don't need secure boot, you can try disabling SMM. It should not be related to TDP MMU, but the logs (thanks!) point at an SMM entry (RIP = 0x8000, CS base=0x7ffc2000).

This is likely to be fixed by https://lore.kernel.org/kvm/20220621150902.46126-1-mlevitsk@xxxxxxxxxx/.

Paolo