Re: [PATCH] perf/intel: Remove Perfmon-v4 counter_freezing support

From: Xu, Like
Date: Tue Nov 10 2020 - 21:42:57 EST


Hi Peter,

On 2020/11/11 4:52, Stephane Eranian wrote:
On Tue, Nov 10, 2020 at 7:37 AM Peter Zijlstra<peterz@xxxxxxxxxxxxx> wrote:
On Tue, Nov 10, 2020 at 04:12:57PM +0100, Peter Zijlstra wrote:
On Mon, Nov 09, 2020 at 10:12:37AM +0800, Like Xu wrote:
The Precise Event Based Sampling(PEBS) supported on Intel Ice Lake server
platforms can provide an architectural state of the instruction executed
after the instruction that caused the event. This patch set enables the
the PEBS via DS feature for KVM (also non) Linux guest on the Ice Lake.
The Linux guest can use PEBS feature like native:

# perf record -e instructions:ppp ./br_instr a
# perf record -c 100000 -e instructions:pp ./br_instr a

If the counter_freezing is not enabled on the host, the guest PEBS will
be disabled on purpose when host is using PEBS facility. By default,
KVM disables the co-existence of guest PEBS and host PEBS.
Thanks Stephane for clarifying the use cases for Freeze-on-[PMI|Overflow].

Please let me express it more clearly.

The goal of the whole patch set is to enable guest PEBS, regardless of
whether the counter_freezing is frozen or not. By default, it will not
support both the guest and the host to use PEBS at the same time.

Please continue reviewing the patch set, especially for the slow path
we proposed this time and related host perf changes:

- add intel_pmu_handle_guest_pebs() to __intel_pmu_pebs_event();
- add switch MSRs (PEBS_ENABLE, DS_AREA, DATA_CFG) to intel_guest_get_msrs();
- the construction of incoming parameters for perf_event_create_kernel_counter();

I believe if you understand the general idea, the comments will be very valuable.

Thanks,
Like Xu

Uuhh, what?!? counter_freezing should never be enabled, its broken. Let
me go delete all that code.
---
Subject: perf/intel: Remove Perfmon-v4 counter_freezing support

Perfmon-v4 counter freezing is fundamentally broken; remove this default
disabled code to make sure nobody uses it.

The feature is called Freeze-on-PMI in the SDM, and if it would do that,
there wouldn't actually be a problem,*however* it does something subtly
different. It globally disables the whole PMU when it raises the PMI,
not when the PMI hits.

This means there's a window between the PMI getting raised and the PMI
actually getting served where we loose events and this violates the
perf counter independence. That is, a counting event should not result
in a different event count when there is a sampling event co-scheduled.

What is implemented is Freeze-on-Overflow, yet it is described as Freeze-on-PMI.
That, in itself, is a problem. I agree with you on that point.

However, there are use cases for both modes.

I can sample on event A and count on B, C and when A overflows, I want
to snapshot B, C.
For that I want B, C at the moment of the overflow, not at the moment
the PMI is delivered. Thus, youd
would want the Freeze-on-overflow behavior. You can collect in this
mode with the perf tool,
IIRC: perf record -e '{cycles,instructions,branches:S}' ....

The other usage model is that of the replay-debugger (rr) which you are alluding
to, which needs precise count of an event including during the skid
window. For that, you need
Freeze-on-PMI (delivered). Note that this tool likely only cares about
user level occurrences of events.

As for counter independence, I am not sure it holds in all cases. If
the events are setup for user+kernel
then, as soon as you co-schedule a sampling event, you will likely get
more counts on the counting
event due to the additional kernel entries/exits caused by
interrupt-based profiling. Even if you were to
restrict to user level only, I would expect to see a few more counts.


This is known to break existing software.

Signed-off-by: Peter Zijlstra (Intel)<peterz@xxxxxxxxxxxxx>