Re: [PATCH] KVM: X86: Emulate APERF/MPERF to report actual VCPU frequency

From: Like Xu
Date: Wed Dec 22 2021 - 01:57:10 EST


On 24/6/2020 4:34 am, Jim Mattson wrote:
On Tue, Jun 23, 2020 at 12:05 PM Sean Christopherson
<sean.j.christopherson@xxxxxxxxx> wrote:

On Tue, Jun 23, 2020 at 11:39:16AM -0700, Jim Mattson wrote:
On Tue, Jun 23, 2020 at 11:29 AM Sean Christopherson
<sean.j.christopherson@xxxxxxxxx> wrote:

On Tue, Jun 23, 2020 at 02:35:30PM +0800, Like Xu wrote:
The aperf/mperf are used to report current CPU frequency after 7d5905dc14a
"x86 / CPU: Always show current CPU frequency in /proc/cpuinfo". But guest
kernel always reports a fixed VCPU frequency in the /proc/cpuinfo, which
may confuse users especially when turbo is enabled on the host.

Emulate guest APERF/MPERF capability based their values on the host.

Co-developed-by: Li RongQing <lirongqing@xxxxxxxxx>
Signed-off-by: Li RongQing <lirongqing@xxxxxxxxx>
Reviewed-by: Chai Wen <chaiwen@xxxxxxxxx>
Reviewed-by: Jia Lina <jialina01@xxxxxxxxx>
Signed-off-by: Like Xu <like.xu@xxxxxxxxxxxxxxx>
---

...

@@ -8312,7 +8376,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
fastpath_t exit_fastpath;
-
+ u64 enter_mperf = 0, enter_aperf = 0, exit_mperf = 0, exit_aperf = 0;
bool req_immediate_exit = false;

if (kvm_request_pending(vcpu)) {
@@ -8516,8 +8580,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_RELOAD;
}

+ if (unlikely(vcpu->arch.hwp.hw_coord_fb_cap))
+ get_host_amperf(&enter_mperf, &enter_aperf);
+
exit_fastpath = kvm_x86_ops.run(vcpu);

+ if (unlikely(vcpu->arch.hwp.hw_coord_fb_cap)) {
+ get_host_amperf(&exit_mperf, &exit_aperf);
+ vcpu_update_amperf(vcpu, get_amperf_delta(enter_aperf, exit_aperf),
+ get_amperf_delta(enter_mperf, exit_mperf));
+ }
+

Is there an alternative approach that doesn't require 4 RDMSRs on every VMX
round trip? That's literally more expensive than VM-Enter + VM-Exit
combined.

It looks like we have quite a few users who are expecting this feature in different scenarios.

I will add a fast path for RO usage and a slow path if the guest tries to change the AMPERF values.


E.g. what about adding KVM_X86_DISABLE_EXITS_APERF_MPERF and exposing the
MSRs for read when that capability is enabled?

When would you load the hardware MSRs with the guest/host values?

Ugh, I was thinking the MSRs were read-only.

EVen if they were read-only, they should power on to zero, and they
will most likely not be zero when a guest powers on.

Can we assume that "not zero when the guest is on" will not harm any guests ?


Doesn't this also interact with TSC scaling?

Yes, it should!

We have too much of a historical burden on TSC emulations.

For practical reasons, what if we only expose the AMPERF cap
if the host/guest has both CONSTANT_TSC and NONSTOP_TSC ?

One more design concern, I wonder if it is *safe* for the guest to
read amperf on pCPU[x] the first time and on pCPU[y] the next time.

Any input ?

Thanks,
Like Xu