Re: [PATCH] KVM: x86/pmu: Prevent any host user from enabling PEBS for profiling guest

From: Like Xu
Date: Thu Nov 30 2023 - 02:30:07 EST


On 29/11/2023 10:38 pm, Liang, Kan wrote:


On 2023-11-29 4:50 a.m., Like Xu wrote:
From: Like Xu <likexu@xxxxxxxxxxx>

Stop using PEBS counters on host to profiling guest. Limit the range of
enabled PEBS counters to only those counters enabled from the guest PEBS
emulation perspective.

If there is a perf-record agent on host that uses perf-tools events like
"cpu-cycles:GP" (G for attr.exclude_host, P for max precise event counter)
to capture guest performance events, then the guest will be hanged. This is
because Intel DS-based PEBS buffer is addressed using the 64-bit linear
address of the current {p/v}CPU context based on MSR_IA32_DS_AREA.

Any perf user using PEBS counters to profile guest on host is, in perf/core
implementation details, trying to set bits on cpuc->intel_ctrl_guest_mask
and arr[pebs_enable].guest, much like the guest PEBS emulation behaviour.
But the subsequent PEBS memory write, regardless of whether guest PEBS is
enabled, can overshoot guest entry and corrupt guest memory.

Profiling guest via PEBS-DS buffer on host is not supported at this time.
Fix this by filtering the real configured value of arr[pebs_enable].guest
with the emulated state of guest enabled PEBS counters, under the condition
of none cross-mapped PEBS counters.

So the counter will be silently disabled. The user never knows why
nothing is sampled.
Since we don't support the case, profiling guest via PEBS-DS buffer on
host. Maybe we should error out when creating the event. For example
(not tested),

Test failed.


diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 3871267d3237..24b90c70737f 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3958,6 +3958,10 @@ static int intel_pmu_hw_config(struct perf_event
*event)
if ((event->attr.config & INTEL_ARCH_EVENT_MASK) ==
INTEL_FIXED_VLBR_EVENT)
return -EINVAL;

+ /* Profiling guest via PEBS-DS buffer on host is not supported. */
+ if (event->attr.exclude_host)
+ return -EINVAL;
+

Guest PEBS emulation also sets this bit, a typical call stack looks like:

intel_pmu_hw_config+0x441/0x4d0
hsw_hw_config+0x12/0xa0
x86_pmu_event_init+0x98/0x370
perf_try_init_event+0x47/0x130
perf_event_alloc+0x446/0xeb0
perf_event_create_kernel_counter+0x38/0x190
pmc_reprogram_counter.constprop.17+0xd9/0x230 [kvm]
kvm_pmu_handle_event+0x1a6/0x310 [kvm]
vcpu_enter_guest+0x1388/0x19b0 [kvm]
vcpu_run+0x117/0x6c0 [kvm]
kvm_arch_vcpu_ioctl_run+0x13d/0x4d0 [kvm]
kvm_vcpu_ioctl+0x301/0x6e0 [kvm]

Alternatively, this path is taken when using PEBS-via-PT to profile guests on host.

The status of the guest can only be queried in the NMI handler and the func
intel_guest_get_msrs() in the perf/core context, where it's easier and more
centrally to review this part of changes that affects vPMU for corner cases.

Maybe adding print info on the perf-tool side would help.

For perf-tool users, it will get 0 number of sample for "cpu-cycles:GP" events,
just like other uncounted perf-tool events.

if (!(event->attr.freq || (event->attr.wakeup_events &&
!event->attr.watermark))) {
event->hw.flags |= PERF_X86_EVENT_AUTO_RELOAD;
if (!(event->attr.sample_type &


Thanks,
Kan


Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
Signed-off-by: Like Xu <likexu@xxxxxxxxxxx>
---
arch/x86/events/intel/core.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index a08f794a0e79..17afd504c35b 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4103,13 +4103,19 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
.guest = pebs_mask & ~cpuc->intel_ctrl_host_mask,
};
+ /* In any case, clear guest PEBS bits first. */
+ arr[global_ctrl].guest &= ~arr[pebs_enable].guest;
+
if (arr[pebs_enable].host) {
/* Disable guest PEBS if host PEBS is enabled. */
arr[pebs_enable].guest = 0;
} else {
/* Disable guest PEBS thoroughly for cross-mapped PEBS counters. */
arr[pebs_enable].guest &= ~kvm_pmu->host_cross_mapped_mask;
- arr[global_ctrl].guest &= ~kvm_pmu->host_cross_mapped_mask;
+
+ /* Prevent any host user from enabling PEBS for profiling guest. */
+ arr[pebs_enable].guest &= (kvm_pmu->pebs_enable & kvm_pmu->global_ctrl);
+
/* Set hw GLOBAL_CTRL bits for PEBS counter when it runs for guest */
arr[global_ctrl].guest |= arr[pebs_enable].guest;
}

base-commit: 6803fb00772cc50cd59a66bd8caaee5c84b13fcf