Re: [PATCH] kvm: don't lose the higher 32 bits of tlbs_dirty

From: Sean Christopherson
Date: Mon Dec 14 2020 - 12:23:15 EST


On Sun, Dec 13, 2020, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
>
> In kvm_mmu_notifier_invalidate_range_start(), tlbs_dirty is used as:
> need_tlb_flush |= kvm->tlbs_dirty;
> with need_tlb_flush's type being int and tlbs_dirty's type being long.
>
> It means that tlbs_dirty is always used as int and the higher 32 bits
> is useless.

It's probably worth noting in the changelog that it's _extremely_ unlikely this
bug can cause problems in practice. It would require encountering tlbs_dirty
on a 4 billion count boundary, and KVM would need to be using shadow paging or
be running a nested guest.

> We can just change need_tlb_flush's type to long to
> make full use of tlbs_dirty.

Hrm, this does solve the problem, but I'm not a fan of continuing to use an
integer variable as a boolean. Rather than propagate tlbs_dirty to
need_tlb_flush, what if this bug fix patch checks tlbs_dirty directly, and then
a follow up patch converts need_tlb_flush to a bool and removes the unnecessary
initialization (see below).

E.g. the net result of both patches would be:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3abcb2ce5b7d..93b6986d3dfc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -473,7 +473,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
const struct mmu_notifier_range *range)
{
struct kvm *kvm = mmu_notifier_to_kvm(mn);
- int need_tlb_flush = 0, idx;
+ bool need_tlb_flush;
+ int idx;

idx = srcu_read_lock(&kvm->srcu);
spin_lock(&kvm->mmu_lock);
@@ -483,11 +484,10 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
* count is also read inside the mmu_lock critical section.
*/
kvm->mmu_notifier_count++;
- need_tlb_flush = kvm_unmap_hva_range(kvm, range->start, range->end,
- range->flags);
- need_tlb_flush |= kvm->tlbs_dirty;
+ need_tlb_flush = !!kvm_unmap_hva_range(kvm, range->start, range->end,
+ range->flags);
/* we've to flush the tlb before the pages can be freed */
- if (need_tlb_flush)
+ if (need_tlb_flush || kvm->tlbs_dirty)
kvm_flush_remote_tlbs(kvm);

spin_unlock(&kvm->mmu_lock);

Cc: stable@xxxxxxxxxxxxxxx
Fixes: a4ee1ca4a36e ("KVM: MMU: delay flush all tlbs on sync_page path")

> Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> ---
> virt/kvm/kvm_main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2541a17ff1c4..4e519a517e9f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -470,7 +470,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
> const struct mmu_notifier_range *range)
> {
> struct kvm *kvm = mmu_notifier_to_kvm(mn);
> - int need_tlb_flush = 0, idx;
> + long need_tlb_flush = 0;

need_tlb_flush doesn't need to be initialized here, it's explicitly set via the
call to kvm_unmap_hva_range().

> + int idx;
>
> idx = srcu_read_lock(&kvm->srcu);
> spin_lock(&kvm->mmu_lock);
> --
> 2.19.1.6.gb485710b
>