[PATCH 1/1] KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC

From: Konstantin Khorenko
Date: Thu Nov 09 2023 - 13:24:09 EST


The following 2 mainstream patches have introduced extra
events accounting:

018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions")
9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")

kvm_pmu_trigger_event() iterates over all PMCs looking for enabled and
this appeared to be fast on Intel CPUs and quite expensive for AMD CPUs.

kvm_pmu_trigger_event() can be optimized not to iterate over all PMCs in
the following cases:

* if PMU is completely disabled for a VM, which is the default
configuration
* if PMU v2 is enabled, but no PMCs are configured

For Intel CPUs:
* By default PMU is disabled for KVM VMs (<pmu state='off'/> or absent
in the VM config xml which results in "-cpu pmu=off" qemu option).
In this case pmu->version is reported as 0 for the appropriate vCPU.

* According to Intel® 64 and IA-32 Architectures Software Developer’s
Manual PMU version 2 and higher provide IA32_PERF_GLOBAL_CTRL MSR
which in particular contains bits which can be used for efficient
detection which fixed-function performance and general-purpose
performance monitoring counters are enabled at the moment.

* Searching for enabled PMCs is fast and the optimization does not
bring noticeable performance increase.

For AMD CPUs:
* For CPUs older than Zen 4 pmu->version is always reported as "1" for
the appropriate vCPU, no matter if PMU is disabled for KVM VMs
(<pmu state='off'/>) or enabled.
So for "old" CPUs currently it's impossible to detect when PMU is
disabled for a VM and skip the iteration by PMCs efficiently.

* Since Zen 4 AMD CPUs support PMU v2 and in this case pmu->version
should be reported as "2" and IA32_PERF_GLOBAL_CTRL MSR is available
and can be used for fast and efficient check for enabled PMCs.
https://www.phoronix.com/news/AMD-PerfMonV2-Linux-Patches
https://www.phoronix.com/news/AMD-PerfMonV2-Guests-KVM

* Optimized preliminary check for enabled PMCs on AMD Zen 4 CPUs
should give quite noticeable performance improvement.

AMD performance results:
CPU: AMD Zen 3 (three!): AMD EPYC 7443P 24-Core Processor

* The test binary is run inside an AlmaLinux 9 VM with their stock kernel
5.14.0-284.11.1.el9_2.x86_64.
* Test binary checks the CPUID instractions rate (instructions per sec).
* Default VM config (PMU is off, pmu->version is reported as 1).
* The Host runs the kernel under test.

# for i in 1 2 3 4 5 ; do ./at_cpu_cpuid.pub ; done | \
awk -e '{print $4;}' | \
cut -f1 --delimiter='.' | \
./avg.sh

Measurements:
1. Host runs stock latest mainstream kernel commit 305230142ae0.
2. Host runs same mainstream kernel + current patch.
3. Host runs same mainstream kernel + current patch + force
guest_pmu_is_enabled() to always return "false" using following change:

- if (pmu->version >= 2 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))
+ if (pmu->version == 1 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))

--------------------------------------
| Kernels | CPUID rate |
--------------------------------------
| 1. | 1360250 |
| 2. | 1365536 (+ 0.4%) |
| 3. | 1541850 (+13.4%) |
--------------------------------------

Measurement (2) gives some fluctuation, the performance is not increased
because the test was done on a Zen 3 CPU, so we are unable to use fast
check for active PMCs.
Measurement (3) shows expected performance boost on a Zen 4 CPU under
the same test.

Signed-off-by: Konstantin Khorenko <khorenko@xxxxxxxxxxxxx>
---
arch/x86/kvm/pmu.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 9ae07db6f0f6..290d407f339b 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -731,12 +731,38 @@ static inline bool cpl_is_matched(struct kvm_pmc *pmc)
return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user;
}

+static inline bool guest_pmu_is_enabled(struct kvm_pmu *pmu)
+{
+ /*
+ * Currently VMs do not have PMU settings in configs which defaults
+ * to "pmu=off".
+ *
+ * For Intel currently this means pmu->version will be 0.
+ * For AMD currently PMU cannot be disabled:
+ * pmu->version should be 2 for Zen 4 cpus and 1 otherwise.
+ */
+ if (pmu->version == 0)
+ return false;
+
+ /*
+ * Starting with PMU v2 IA32_PERF_GLOBAL_CTRL MSR is available and
+ * it can be used to check if none PMCs are enabled.
+ */
+ if (pmu->version >= 2 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))
+ return false;
+
+ return true;
+}
+
void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
int i;

+ if (!guest_pmu_is_enabled(pmu))
+ return;
+
for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) {
pmc = static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, i);

--
2.39.3