Re: [PATCH] kvm: don't lose the higher 32 bits of tlbs_dirty

From: Paolo Bonzini
Date: Tue Dec 15 2020 - 05:34:16 EST


On 14/12/20 18:20, Sean Christopherson wrote:
On Sun, Dec 13, 2020, Lai Jiangshan wrote:
From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>

In kvm_mmu_notifier_invalidate_range_start(), tlbs_dirty is used as:
need_tlb_flush |= kvm->tlbs_dirty;
with need_tlb_flush's type being int and tlbs_dirty's type being long.

It means that tlbs_dirty is always used as int and the higher 32 bits
is useless.

It's probably worth noting in the changelog that it's _extremely_ unlikely this
bug can cause problems in practice. It would require encountering tlbs_dirty
on a 4 billion count boundary, and KVM would need to be using shadow paging or
be running a nested guest.

We can just change need_tlb_flush's type to long to
make full use of tlbs_dirty.

Hrm, this does solve the problem, but I'm not a fan of continuing to use an
integer variable as a boolean. Rather than propagate tlbs_dirty to
need_tlb_flush, what if this bug fix patch checks tlbs_dirty directly, and then
a follow up patch converts need_tlb_flush to a bool and removes the unnecessary
initialization (see below).

Indeed, the compiler should be able to convert || to | if useful and valid (it may or may not do it depending on the sizes of types involved, but that's Someone Else's Problem and this is not really a path where every instruction matter).

Paolo

E.g. the net result of both patches would be:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3abcb2ce5b7d..93b6986d3dfc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -473,7 +473,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
const struct mmu_notifier_range *range)
{
struct kvm *kvm = mmu_notifier_to_kvm(mn);
- int need_tlb_flush = 0, idx;
+ bool need_tlb_flush;
+ int idx;

idx = srcu_read_lock(&kvm->srcu);
spin_lock(&kvm->mmu_lock);
@@ -483,11 +484,10 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
* count is also read inside the mmu_lock critical section.
*/
kvm->mmu_notifier_count++;
- need_tlb_flush = kvm_unmap_hva_range(kvm, range->start, range->end,
- range->flags);
- need_tlb_flush |= kvm->tlbs_dirty;
+ need_tlb_flush = !!kvm_unmap_hva_range(kvm, range->start, range->end,
+ range->flags);
/* we've to flush the tlb before the pages can be freed */
- if (need_tlb_flush)
+ if (need_tlb_flush || kvm->tlbs_dirty)
kvm_flush_remote_tlbs(kvm);

spin_unlock(&kvm->mmu_lock);

Cc: stable@xxxxxxxxxxxxxxx
Fixes: a4ee1ca4a36e ("KVM: MMU: delay flush all tlbs on sync_page path")

Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
---
virt/kvm/kvm_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2541a17ff1c4..4e519a517e9f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -470,7 +470,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
const struct mmu_notifier_range *range)
{
struct kvm *kvm = mmu_notifier_to_kvm(mn);
- int need_tlb_flush = 0, idx;
+ long need_tlb_flush = 0;

need_tlb_flush doesn't need to be initialized here, it's explicitly set via the
call to kvm_unmap_hva_range().

+ int idx;
idx = srcu_read_lock(&kvm->srcu);
spin_lock(&kvm->mmu_lock);
--
2.19.1.6.gb485710b