Re: [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust()

From: David Matlack
Date: Mon Mar 21 2022 - 18:45:34 EST


On Mon, Mar 21, 2022 at 3:00 PM David Matlack <dmatlack@xxxxxxxxxx> wrote:
>
> On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@xxxxxxxxxx> wrote:
> >
> > Add extra check to specify the case of nx hugepage and allow KVM to
> > reconstruct large mapping after dirty logging is disabled. Existing code
> > works only for nx hugepage but the condition is too general in that does
> > not consider other usage case (such as dirty logging).
>
> KVM calls kvm_mmu_zap_collapsible_sptes() when dirty logging is
> disabled. Why is that not sufficient?

Ahh I see, kvm_mmu_zap_collapsible_sptes() only zaps the leaf SPTEs.
Could you add a blurb about this in the commit message for future
reference?

>
> > Moreover, existing
> > code assumes that a present PMD or PUD indicates that there exist 'smaller
> > SPTEs' under the paging structure. This assumption may no be true if
> > consider the zapping leafs only behavior in MMU.
>
> Good point. Although, that code just got reverted. Maybe say something like:
>
> This assumption may not be true in the future if KVM gains support
> for zapping only leaf SPTEs.

Nevermind, support for zapping leaf SPTEs already exists for zapping
collapsible SPTEs.



>
> >
> > Missing the check causes KVM incorrectly regards the faulting page as a NX
> > huge page and refuse to map it at desired level. And this leads to back
> > performance in shadow mmu and potentiall TDP mmu.
>
> s/potentiall/potentially/
>
> >
> > Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
> > Cc: stable@xxxxxxxxxxxxxxx
> >
> > Reviewed-by: Ben Gardon <bgardon@xxxxxxxxxx>
> > Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx>
> > ---
> > arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++--
> > 1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 5628d0ba637e..4d358c273f6c 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> > cur_level == fault->goal_level &&
> > is_shadow_present_pte(spte) &&
> > !is_large_pte(spte)) {
> > + struct kvm_mmu_page *sp;
> > + u64 page_mask;
> > + /*
> > + * When nx hugepage flag is not set, there is no reason to
> > + * go down to another level. This helps demand paging to
> > + * generate large mappings.
> > + */
> > + sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> > + if (!sp->lpage_disallowed)
> > + return;
> > /*
> > * A small SPTE exists for this pfn, but FNAME(fetch)
> > * and __direct_map would like to create a large PTE
> > @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> > * patching back for them into pfn the next 9 bits of
> > * the address.
> > */
> > - u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> > - KVM_PAGES_PER_HPAGE(cur_level - 1);
> > + page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> > + KVM_PAGES_PER_HPAGE(cur_level - 1);
> > fault->pfn |= fault->gfn & page_mask;
> > fault->goal_level--;
> > }
> > --
> > 2.35.1.894.gb6a874cedc-goog
> >