Re: [PATCH 06/23] KVM: MMU: load new PGD once nested two-dimensional paging is initialized

From: Paolo Bonzini
Date: Wed Feb 09 2022 - 07:34:38 EST


On 2/4/22 20:18, David Matlack wrote:
On Fri, Feb 04, 2022 at 06:57:01AM -0500, Paolo Bonzini wrote:
__kvm_mmu_new_pgd looks at the MMU's root_level and shadow_root_level
via fast_pgd_switch.

Those checks are just for performance correct (to skip iterating through
the list of roots)?

Either way, it's probably worth including a Fixes tag below.

It's actually a much bigger mess---though it's working as intended, except that in some cases the caching is suboptimal.

Basically, you have to call __kvm_mmu_new_pgd *before* kvm_init_mmu because of how fast_pgd_switch takes care of stashing the current root in the cache. A PAE root is never placed in the cache exactly because mmu->root_level and mmu->shadow_root_level hold the value for the _current_ root. In that case, fast_pgd_switch does not look for a cached PGD (even if according to the role it could be there).

__kvm_mmu_new_pgd then frees the current root using kvm_mmu_free_roots. kvm_mmu_free_roots *also* uses mmu->root_level and mmu->shadow_root_level to distinguish whether the page table uses a single root or 4 PAE roots. Because kvm_init_mmu can overwrite muu->root_level, kvm_mmu_free_roots must also be called before kvm_init_mmu.

I do wonder if the use of mmu->shadow_root_level was intentional (it certainly escaped me when first reviewing fast PGD switching), but fortunately none of this is necessary. PAE roots can be identified from !to_shadow_page(root_hpa), so the better fix is to do that:

- in fast_pgd_switch/cached_root_available, you need to account for two possible transformations of the cache: either the old entry becomes the MRU of the prev_roots as in the current code, or the old entry cannot be cached. This is 20-30 more lines of code.

- in kvm_mmu_free_roots, just use to_shadow_page instead of mmu->root_level and mmu->shadow_root_level.

Once this is in place, the original bug is fixed simply by calling kvm_mmu_new_pgd after kvm_init_mmu. kvm_mmu_reset_context is not doing now to avoid having to figure out the new role, but it can do that easily after the above change.

I'll keep this cpu_role refactoring around, but strictly speaking it becomes a separate change than the optimization.

Paolo