Re: [PATCH] kvm: x86: emulate MSR_PLATFORM_INFO msr bits

From: Sean Christopherson
Date: Mon Aug 21 2023 - 12:12:18 EST


+Aaron

When resending a patch, e.g. to change To: or Cc:, tag it RESEND. I got three
copies of this...

On Mon, Aug 21, 2023, Hao Xiang wrote:
>
>
> On 2023/8/21 18:44, Chao Gao wrote:
> > On Mon, Aug 21, 2023 at 05:11:16PM +0800, Hao Xiang wrote:
> > > For reason that,
> > >
> > > The turbo frequency info depends on specific machine type. And the msr value
> > > of MSR_PLATFORM_INFO may be diferent on diffrent generation machine.
> > >
> > > Get following msr bits (needed by turbostat on intel platform) by rdmsr
> > > MSR_PLATFORM_INFO directly in KVM is more reasonable. And set these msr bits
> > > as vcpu->arch.msr_platform_info default value.
> > > -bit 15:8, Maximum Non-Turbo Ratio (MAX_NON_TURBO_LIM_RATIO)
> > > -bit 47:40, Maximum Efficiency Ratio (MAX_EFFICIENCY_RATIO)
> >
> > I don't get why QEMU cannot do this with the existing interface, e.g.,
> > KVM_SET_MSRS.
> >
> > will the MSR value be migrated during VM migration?
> >
> > looks we are in a dilemma. on one side, if the value is migrated, the value can
> > become inconsisntent with hardware value. On the other side, changing the ratio
> > bits at runtime isn't the architectural behavior.
> >
> > And the MSR is per-socket. In theory, a system can have two sockets with
> > different values of the MSR. what if a vCPU is created on a socket and then
> > later runs on the other socket?
> >
>
> Set these msr bits (needed by turbostat on intel platform) in KVM by
> default.
> Of cource, QEMU can also set MSR value by need. It does not conflict.

It doesn't conflict per se, but it's still problematic. By stuffing a default
value, KVM _forces_ userspace to override the MSR to align with the topology and
CPUID defined by userspace. And if userspace uses KVM's "default" CPUID, or lack
thereof, using the underlying values from hardware are all but guaranteed to be
wrong.

The existing code that sets MSR_PLATFORM_INFO_CPUID_FAULT really should not exist,
i.e. KVM shouldn't shouldn't assume userspace wants to expose CPUID faulting to
the guest. That particular one probably isn't worth trying to retroactively fix.

Ditto for setting MSR_IA32_ARCH_CAPABILITIES; KVM is overstepping, but doing so
likely doesn't cause problems.

MSR_IA32_PERF_CAPABILITIES is a different story. Setting a non-zero default value
is blatantly wrong, as KVM will advertise vPMU features even if userspace doesn't
advertise. Aaron is planning on sending a patch for this one (I'm hoping we can
get away with retroactively dropping the code without having to add a quirk).

*If* we need KVM to expose the ratios to userspace, then the correct way to do so
is handle turbo and efficiency ratio information is to by implementing support in
kvm_get_msr_feature(), i.e. KVM_GET_MSRS on /dev/kvm. Emphasis on "if", because
I would prefer to do nothing in KVM if that information is already surfaced to
userspace through other mechanisms in the kernel.