Re: [PATCH v4 11/12] KVM: x86/svm/pmu: Add AMD PerfMonV2 support

From: Like Xu
Date: Mon Apr 10 2023 - 07:34:26 EST


On 7/4/2023 10:44 pm, Sean Christopherson wrote:
On Fri, Apr 07, 2023, Like Xu wrote:
On 7/4/2023 9:35 am, Sean Christopherson wrote:
On Tue, Feb 14, 2023, Like Xu wrote:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
+ if (!msr_info->host_initiated)
+ return 0; /* Writes are ignored */

Where is the "writes ignored" behavior documented? I can't find anything in the
APM that defines write behavior.

KVM would follow the real hardware behavior once specifications stay silent
on details or secret.

So is that a "this isn't actually documented anywhere" answer? It's not your
responsibility to get AMD to document their CPUs, but I want to clearly document
when KVM's behavior is based solely off of observed hardware behavior, versus an
actual specification.

Indeed, you draw a clearer line than APM or PPR.

Spec-defined:

RO: Read-only. Readable; writes are ignored (Per PPR "AccessType Definitions")
WO: Writable. Reads are undefined. (Per PPR "AccessType Definitions")

And vPMU will refer to real HW observations for the (hidden) undefined behaviour.
More comments in the new version may help. Please check.


How about this:

/*
* Note, AMD ignores writes to reserved bits and read-only PMU MSRs,
* whereas Intel generates #GP on attempts to write reserved/RO MSRs.
*/

Looks good.

+ pmu->nr_arch_gp_counters = min_t(unsigned int,
+ ebx.split.num_core_pmc,
+ kvm_pmu_cap.num_counters_gp);
+ } else if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) {
pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE;

This needs to be sanitized, no? E.g. if KVM only has access to 4 counters, but
userspace sets X86_FEATURE_PERFCTR_CORE anyways. Hrm, unless I'm missing something,
that's a pre-existing bug.

Now your point is that if a user space more capbility than KVM can support,
KVM should constrain it.
Your previous preference was that the user space can set capbilities that
evene if KVM doesn't support as long as it doesn't break KVM and host and the
guest will eat its own.

Letting userspace define a "bad" configuration is perfectly ok, but KVM needs to
be careful not to endanger itself by consuming the bad state. A good example is
the handling of nested SVM features in svm_vcpu_after_set_cpuid(). KVM lets
userspace define anything and everything, but KVM only actually tries to utilize
a feature if the feature is actually supported in hardware.

In this case, it's not clear to me that putting a bogus value into "nr_arch_gp_counters"
is safe (for KVM). And AIUI, the guest can't actually use more than
kvm_pmu_cap.num_counters_gp counters, i.e. KVM isn't arbitrarily restricting the
setup.

AFAI, when a guest has more counters (N) than the host (M), and they are all enabled,
thus KVM will create an equal number (N) of perf_events, and these events will occupy
real hardware counters (M) in the host perf scheduler subsystem in a round robin way.

From the point of view of a vCPU, its virtual counters can only occupy the hardware
part of the time slice to count for guest payload, which affects the accuracy. However,
from the host security point of view, too many counters will only result in too many
perf_events created by KVM, which is a normal usage for the perf subsystem, called
perf counter multiplexing. It seems to be safe (using perf API for KVM).

But considering that scheduling too many perf_events is also a performance overhead,
it can also be seen as a performance attack on the scheduling of vCPU processes on host.

Back to the diff itself, code for intel_pmu does a similar sanity check, thus here we just
let AMD_PMU follow the same decision pattern. Please refer to the latest version.