[PATCH] KVM: Avoid illegal stage2 mapping on invalid memory slot

From: Gavin Shan
Date: Thu Jun 08 2023 - 05:05:16 EST


We run into guest hang in edk2 firmware when KSM is kept as running
on the host. The edk2 firmware is waiting for status 0x80 from QEMU's
pflash device (TYPE_PFLASH_CFI01) during the operation for sector
erasing or buffered write. The status is returned by reading the
memory region of the pflash device and the read request should
have been forwarded to QEMU and emulated by it. Unfortunately, the
read request is covered by an illegal stage2 mapping when the guest
hang issue occurs. The read request is completed with QEMU bypassed and
wrong status is fetched.

The illegal stage2 mapping is populated due to same page mering by
KSM at (C) even the associated memory slot has been marked as invalid
at (B).

CPU-A CPU-B
----- -----
ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
kvm_vm_ioctl_set_memory_region
kvm_set_memory_region
__kvm_set_memory_region
kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
kvm_invalidate_memslot
kvm_copy_memslot
kvm_replace_memslot
kvm_swap_active_memslots (A)
kvm_arch_flush_shadow_memslot (B)
same page merging by KSM
kvm_mmu_notifier_change_pte
kvm_handle_hva_range
__kvm_handle_hva_range (C)

Fix the issue by skipping the invalid memory slot at (C) to avoid the
illegal stage2 mapping. Without the illegal stage2 mapping, the read
request for the pflash's status is forwarded to QEMU and emulated by
it. The correct pflash's status can be returned from QEMU to break
the infinite wait in edk2 firmware.

Cc: stable@xxxxxxxxxxxxxxx # v5.13+
Fixes: 3039bcc74498 ("KVM: Move x86's MMU notifier memslot walkers to generic code")
Reported-by: Shuai Hu <hshuai@xxxxxxxxxx>
Reported-by: Zhenyu Zhang <zhenyzha@xxxxxxxxxx>
Signed-off-by: Gavin Shan <gshan@xxxxxxxxxx>
---
virt/kvm/kvm_main.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 479802a892d4..7f81a3a209b6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -598,6 +598,9 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
unsigned long hva_start, hva_end;

slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]);
+ if (slot->flags & KVM_MEMSLOT_INVALID)
+ continue;
+
hva_start = max(range->start, slot->userspace_addr);
hva_end = min(range->end, slot->userspace_addr +
(slot->npages << PAGE_SHIFT));
--
2.23.0