[RFC PATCH v4 2/2] KVM: x86/mmu: replace BUG() with KVM_BUG() in shadow mmu

From: Mingwei Zhang
Date: Tue Nov 29 2022 - 14:13:02 EST


Replace BUG() in pte_list_remove() with KVM_BUG() to avoid crashing the
host. MMU bug is difficult to discover due to various racing conditions and
corner cases and thus it extremely hard to debug. The situation gets much
worse when it triggers the shutdown of a host. Host machine crash
eliminates everything including the potential clues for debugging.

BUG() or BUG_ON() is probably no longer appropriate as the host reliability
is top priority in many business scenarios. Crashing the physical machine
is almost never a good option as it eliminates innocent VMs and cause
service outage in a larger scope. Even worse, if attacker can reliably
triggers this code by diverting the control flow or corrupting the memory
or leveraging a KVM bug, then this becomes vm-of-death attack. This is a
huge attack vector to cloud providers, as the death of one single host
machine is not the end of the story. Without manual interferences, a failed
cloud job may be dispatched to other hosts and continue host crashes until
all of them are dead.

Because of the above reasons, shrink the scope of crash to the target VM
only.

Cc: Nagareddy Reddy <nspreddy@xxxxxxxxxx>
Cc: Jim Mattson <jmattson@xxxxxxxxxx>
Cc: David Matlack <dmatlack@xxxxxxxxxx>
Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx>
---
arch/x86/kvm/mmu/mmu.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b5a44b8f5f7b..12790ccb8731 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -954,15 +954,16 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte,
struct pte_list_desc *prev_desc;
int i;

- if (!rmap_head->val) {
- pr_err("%s: %p 0->BUG\n", __func__, spte);
- BUG();
- } else if (!(rmap_head->val & 1)) {
+ if (KVM_BUG(!rmap_head->val, kvm, "rmap for %p is empty", spte))
+ return;
+
+ if (!(rmap_head->val & 1)) {
rmap_printk("%p 1->0\n", spte);
- if ((u64 *)rmap_head->val != spte) {
- pr_err("%s: %p 1->BUG\n", __func__, spte);
- BUG();
- }
+
+ if (KVM_BUG((u64 *)rmap_head->val != spte, kvm,
+ "single rmap for %p doesn't match", spte))
+ return;
+
rmap_head->val = 0;
} else {
rmap_printk("%p many->many\n", spte);
@@ -979,8 +980,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte,
prev_desc = desc;
desc = desc->more;
}
- pr_err("%s: %p many->many\n", __func__, spte);
- BUG();
+ KVM_BUG(true, kvm, "no rmap for %p (many->many)", spte);
}
}

--
2.38.1.584.g0f3c55d4c2-goog