Re: [PATCH] kvm: x86: emulate MSR_PLATFORM_INFO msr bits

From: Xiaoyao Li
Date: Thu Aug 24 2023 - 23:28:57 EST


On 8/23/2023 10:31 PM, Sean Christopherson wrote:
On Wed, Aug 23, 2023, Xiaoyao Li wrote:
On 8/22/2023 12:11 AM, Sean Christopherson wrote:
Set these msr bits (needed by turbostat on intel platform) in KVM by
default. Of cource, QEMU can also set MSR value by need. It does not
conflict.

It doesn't conflict per se, but it's still problematic. By stuffing a default
value, KVM _forces_ userspace to override the MSR to align with the topology and
CPUID defined by userspace.

I don't understand how this MSR is related to topology and CPUID?

Heh, looked at the SDM to double check myself, and the first hit when searching
for MSR_PLATFORM_INFO says:

When TSC scaling is enabled for a guest using Intel PT, the VMM should ensure
that the value of Maximum Non-Turbo Ratio[15:8] in MSR_PLATFORM_INFO (MSR 0CEH)
and the TSC/”core crystal clock” ratio (EBX/EAX) in CPUID leaf 15H are set in
a manner consistent with the resulting TSC rate that will be visible to the VM.

I see.

As Chao pointed out, the MSR is technically per package, so a weird setup could
have sockets with different frequencies, or enumerate a virtual topology to the
guest with such a configuration.

Every feature might get into trouble if not consistent across packages, no matter per-thread/per-core/per-package.

I doubt/hope no one actually does something
like that, but it's theoretically possible, and one of the many reasons why KVM
needs to stay out of the way and let userspace define the vCPU model.

For this specific case, the max non-turbo frequency needs to be consistent with TSC frequency. Because KVM has default TSC frequency as host's tsc_khz, for correctness, it should have a default value to match with KVM's default TSC when userspace provide no explicit configuration.

But it's not the problem this patch targets. I'm OK to keep returning 0 as-is until some bug reported due to the inconsistent between max non-turbo frequency and TSC frequency.

And if userspace uses KVM's "default" CPUID, or lack thereof, using the
underlying values from hardware are all but guaranteed to be wrong.

Could you please elaborate?

I guess an empty CPUID would probably be ok? If there's no CPUID.0x15, it can't
be wrong. It's largely a moot point though, I highly doubt anyone runs a "real"
VM without populating _something_ in guest CPUID.

current QEMU doesn't configure CPUID leaf 0x15, nor does it configure MSR_PLATFORM_INFO[15:8]. I need to take time to dig how Linux gets the TSC frequency.