Re: [PATCH v2 5/8] KVM: x86: Use MTRR macros to define possible MTRR MSR ranges

From: Huang, Kai
Date: Sun May 14 2023 - 20:37:42 EST


On Fri, 2023-05-12 at 09:35 -0700, Sean Christopherson wrote:
> On Fri, May 12, 2023, Kai Huang wrote:
> > On Thu, 2023-05-11 at 16:33 -0700, Sean Christopherson wrote:
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index e7f78fe79b32..8b356c9d8a81 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -3700,8 +3700,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > > return 1;
> > > }
> > > break;
> > > - case 0x200 ... MSR_IA32_MC0_CTL2 - 1:
> > > - case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff:
> > > + case MSR_IA32_CR_PAT:
> > > + case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
> > > + case MSR_MTRRdefType:
> > > return kvm_mtrr_set_msr(vcpu, msr, data);
> > > case MSR_IA32_APICBASE:
> > > return kvm_set_apic_base(vcpu, msr_info);
> > > @@ -4108,9 +4109,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > > msr_info->data = kvm_scale_tsc(rdtsc(), ratio) + offset;
> > > break;
> > > }
> > > + case MSR_IA32_CR_PAT:
> > > case MSR_MTRRcap:
> >
> > ... Should we put MSR_IA32_CR_PAT after MSR_MTRRcap so it can be symmetric to
> > kvm_set_msr_common()?
> >
> > Looks there's no reason to put it before MSR_MTRRcap.
>
> No, it's above MTRRcap for two reasons:
>
> 1. When PAT is moved out of mtrr.c, PAT doesn't get bunded with the other MTRRs
> and so would just need to be hoisted back up. The end code looks like:
>
> case MSR_IA32_CR_PAT:
> msr_info->data = vcpu->arch.pat;
> break;
> case MSR_MTRRcap:
> case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
> case MSR_MTRRdefType:
> return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);

Sorry I mistakenly thought MSR_MTRRcap wasn't handled in kvm_mtrr_get_msr().
Yes looks good to me.

>
> 2. There is no MSR_MTRRcap case statement in kvm_set_msr_common() because it's
> a read-only MSR, i.e. the two can't be symmetric :-)

Do you know why it is a read-only MSR, which enables both FIXED ranges and
(fixed number of) dynamic ranges?

I am asking because there's a x86 series to fake a simple synthetic MTRR which
neither has fixed nor dynamic ranges but only has a default MSR_MTRRdefType:

https://lore.kernel.org/lkml/20230509233641.GGZFrZCTDH7VwUMp5R@fat_crate.local/T/

The main use cases are Xen PV guests and SEV-SNP guests running under
Hyper-V, but it appears TDX guest is also desired to have similar handling,
because: 

1) TDX module exposes MTRR in CPUID to guest, but handles nothing about MTRR
MSRs but only injects #VE.

2) TDX module always maps guest private memory as WB (and ignores guest's PAT
IIUC);

3) For shared memory, w/o non-coherent DMA guest's MTRRs are ignored by KVM
anyway. TDX doesn't officially support non-trusted device assignment AFAICT.
Even we want to consider non-coherent DMA, it would only add confusion to honor
guest's MTRRs since they can point to private memory, which is always mapped as
WB.

So basically looks there's no value to exposing FIXED and dynamic MTRR ranges to
TDX guest.