Re: [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next()

From: Sean Christopherson
Date: Thu Aug 12 2021 - 12:03:57 EST


On Thu, Aug 12, 2021, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
>
> So far, the loop bodies already ensure the pte is present before calling
> __shadow_walk_next(). But checking pte present in __shadow_walk_next()
> is a more prudent way of programing and loop bodies will not need
> to always check it. It allows us removing is_shadow_present_pte()
> in the loop bodies.

There needs to be more analysis in the changelog, as there are many more callers
to __shadow_walk_next() than the three that are modified in the next patch. It
might even make sense to squash the two patches together, i.e. make it a "move"
instead of an "add + remove", and then explicitly explain why it's ok to add the
check for paths that do _not_ currently have a !is_shadow_present_pte() in the
loop body.

Specifically, FNAME(fetch) via shadow_walk_next() and __direct_map() via
for_each_shadow_entry() do not currently terminate their walks with a !PRESENT,
but they get away with it because they install present non-leaf SPTEs in the loop
itself.

The other argument for the audit (changelog+patch) of all users is that the next
patch misses FNAME(invlpg), e.g.

@@ -977,7 +980,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
FNAME(update_pte)(vcpu, sp, sptep, &gpte);
}

- if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
+ if (!sp->unsync_children)
break;
}
write_unlock(&vcpu->kvm->mmu_lock);

It would also be worthwhile to document via the changelog that terminating on
!is_shadow_present_pte() is 100% the correct behavior, as walking past a !PRESENT
SPTE would lead to attempting to read a the next level SPTE from a garbage
iter->shadow_addr.

And for clarity and safety, I think it would be worth adding the patch below as
a prep patch to document and enforce that walking the non-leaf SPTEs when faulting
in a page should never terminate early.