Re: [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex

From: Paolo Bonzini
Date: Wed Apr 28 2021 - 19:43:02 EST


On 28/04/21 23:46, Ben Gardon wrote:
On Wed, Apr 28, 2021 at 2:41 PM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:

On 28/04/21 22:40, Ben Gardon wrote:
... However with the locking you propose below, we might still run
into issues on a move or delete, which would mean we'd still need the
separate memory allocation for the rmaps array. Or we do some
shenanigans where we try to copy the rmap pointers from the other set
of memslots.

If that's (almost) as easy as passing old to
kvm_arch_prepare_memory_region, that would be totally okay.

Unfortunately it's not quite that easy because it's all the slots
_besides_ the one being modified where we'd need to copy the rmaps.

Ah, now I understand the whole race. And it seems to me that if one
kvm_dup_memslots within the new lock fixed a bug, two kvm_dup_memslots
within the new lock are going to fix two bugs. :)

Seriously: unless I'm missing another case (it's late here...), it's
not ugly and it's still relatively easy to explain.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2799c6660cce..48929dd5fb29 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1270,7 +1270,7 @@ static int check_memory_region_flags(const struct kvm_userspace_memory_region *m
return 0;
}
-static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
+static void install_new_memslots(struct kvm *kvm,
int as_id, struct kvm_memslots *slots)
{
struct kvm_memslots *old_memslots = __kvm_memslots(kvm, as_id);
@@ -1280,7 +1280,9 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS;
rcu_assign_pointer(kvm->memslots[as_id], slots);
+ mutex_unlock(&kvm->slots_arch_lock);
synchronize_srcu_expedited(&kvm->srcu);
+ kvfree(old_memslots);
/*
* Increment the new memslot generation a second time, dropping the
@@ -1302,8 +1304,6 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
kvm_arch_memslots_updated(kvm, gen);
slots->generation = gen;
-
- return old_memslots;
}
/*
@@ -1342,6 +1342,7 @@ static int kvm_set_memslot(struct kvm *kvm,
struct kvm_memslots *slots;
int r;
+ mutex_lock(&kvm->slots_arch_lock);
slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
if (!slots)
return -ENOMEM;
@@ -1353,14 +1354,7 @@ static int kvm_set_memslot(struct kvm *kvm,
*/
slot = id_to_memslot(slots, old->id);
slot->flags |= KVM_MEMSLOT_INVALID;
-
- /*
- * We can re-use the old memslots, the only difference from the
- * newly installed memslots is the invalid flag, which will get
- * dropped by update_memslots anyway. We'll also revert to the
- * old memslots if preparing the new memory region fails.
- */
- slots = install_new_memslots(kvm, as_id, slots);
+ install_new_memslots(kvm, as_id, slots);
/* From this point no new shadow pages pointing to a deleted,
* or moved, memslot will be created.
@@ -1370,6 +1364,9 @@ static int kvm_set_memslot(struct kvm *kvm,
* - kvm_is_visible_gfn (mmu_check_root)
*/
kvm_arch_flush_shadow_memslot(kvm, slot);
+
+ mutex_lock(&kvm->slots_arch_lock);
+ slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
}
r = kvm_arch_prepare_memory_region(kvm, new, mem, change);
@@ -1377,16 +1374,17 @@ static int kvm_set_memslot(struct kvm *kvm,
goto out_slots;
update_memslots(slots, new, change);
- slots = install_new_memslots(kvm, as_id, slots);
+ install_new_memslots(kvm, as_id, slots);
kvm_arch_commit_memory_region(kvm, mem, old, new, change);
-
- kvfree(slots);
return 0;
out_slots:
- if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
+ if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
+ slot = id_to_memslot(slots, old->id);
+ slot->flags &= ~KVM_MEMSLOT_INVALID;
slots = install_new_memslots(kvm, as_id, slots);
+ }
kvfree(slots);
return r;
}

One could optimize things a bit by reusing the allocation and only
doing a memcpy from the new memslots array to the old one under the
slots_arch_lock. (Plus the above still lacks a mutex_init and
should be split in two patches, with the mutex going in the second;
but you get the idea and code sometimes is easier than words).

Paolo