Re: [PATCH v1 1/3] KVM: x86: Convert TDP level calculation to vendor's specific code

From: Wei Huang
Date: Thu Aug 05 2021 - 18:26:31 EST




On 8/5/21 4:51 PM, Sean Christopherson wrote:
> On Thu, Aug 05, 2021, Wei Huang wrote:
>> Currently the TDP level for x86 vCPU is calculated by checking both
>> MAXPHYADDR and max_tdp_level. This design assumes that all x86 CPUs have
>> the flexibility of changing the nested page table level different from host
>> CPU. This assumption might not be true.
>
> Heh, no need to be circumspect, just state that 5-level NPT inherits CR4.LA57
> from the host. I didn't fully understand this sentence until I looked at patch 3.

Sure, I will fix the comments

>
>> To solve this problem, let us
>> create a kvm_x86_ops specific function for TDP level calculation.
>>
>> Signed-off-by: Wei Huang <wei.huang2@xxxxxxx>
>> ---
>
> ...
>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 974cbfb1eefe..20ddfbac966e 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -723,7 +723,6 @@ struct kvm_vcpu_arch {
>>
>> u64 reserved_gpa_bits;
>> int maxphyaddr;
>> - int max_tdp_level;
>
> Ha, this is leftover crud that can get zapped no matter what.
>

Correct, this field is not being used at this moment and should be removed.

>> /* emulate context */
>>
>
> ...
>
>> -static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
>> -{
>> - /* Use 5-level TDP if and only if it's useful/necessary. */
>> - if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48)
>
> I'd strongly prefer to keep this logic in the MMU. When this was in vendor code,
> there were multiple bugs where the MMU and VMX didn't communicate correctly, I
> really don't want to back down that road.
>
> Actually, I'm very, very tempted to say we should simply drop the cpuid_maxphyaddr()
> bit and just return the max level (and I suppose rename it), e.g.
>
> return mmu_tdp_level;
>
> It's effectively a single 4kb page per VM, and Intel's numbers on 5-level paging
> were that there was no measurable cost to the extra level. I would hope that
> holds true here, too.

4KB waste per VM is possibly OK. My concern is the unnecessary perf cost
of one extra level. But if you think the hit is minimal, then returning
mmu_tdp_level without checking cpuid_maxphyaddr() is cleaner.

>
> If we want to keep the MAXPHYADDR behavior, I'd vote for something like:
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b4b65c21b2ca..7e35f2bf89b4 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -97,6 +97,7 @@ module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644);
> bool tdp_enabled = false;
>
> static int max_huge_page_level __read_mostly;
> +static int tdp_root_level __read_mostly;
> static int max_tdp_level __read_mostly;
>
> enum {
> @@ -4645,6 +4646,9 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu,
>
> static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
> {
> + if (tdp_root_level)
> + return tdp_root_level;
> +
> /* Use 5-level TDP if and only if it's useful/necessary. */
> if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48)
> return 4;
> @@ -5336,10 +5340,11 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
> */
> }
>
> -void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
> - int tdp_huge_page_level)
> +void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> + int tdp_max_root_level, int tdp_huge_page_level)
> {
> tdp_enabled = enable_tdp;
> + tdp_root_level = tdp_forced_root_level;
> max_tdp_level = tdp_max_root_level;
>
> /*
>