Re: perf tool: Issues with metricgroups

From: John Garry
Date: Wed Jun 09 2021 - 06:29:49 EST


On 09/06/2021 07:15, Ian Rogers wrote:

Hi Ian,

The fix to avoid uncore_ events being deduplicated against each other
added complexity to the code and means that metric-no-group doesn't
really work any more. I have it on my list of things to look at. It
relates to what you are looking at as the deduplication afterward is
tricky given the funny invariants on evsel names. I think it would be
easier to deduplicate events before doing the event parse. It may also
be good to change evsels so that they own the string for their name
(this would mean uncore_imc events could have unique names and not get
deduplicated against each other). The invariants around cycles in your
change look weird, but I can see how it might workaround an issue. My
attempts to reproduce the issue weren't successful on a SkylakeX.

I am a bit surprised that you could not reproduce on SkylakeX, as the metric expressions are the same.

As an experiment I hacked the mapfile.csv to make my broadwell machine pick up the skylakex pmu-events:

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index 5f5df6560202..3f170fc430b2 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -1,6 +1,6 @@
Family-model,Version,Filename,EventType
GenuineIntel-6-56,v5,broadwellde,core
-GenuineIntel-6-3D,v17,broadwell,core
+GenuineIntel-6-3D,v17,skylakex,core
GenuineIntel-6-47,v17,broadwell,core
GenuineIntel-6-4F,v10,broadwellx,core
GenuineIntel-6-1C,v4,bonnell,core


And I still see the issue:

john@localhost:~/acme/tools/perf> sudo ./perf stat -v -M retiring,backend_bound sleep 1
Using CPUID GenuineIntel-6-3D-4
metric expr uops_retired.retire_slots / (4 * cycles) for Retiring
found event cycles
found event uops_retired.retire_slots
metric expr 1 - ( (idq_uops_not_delivered.core / (4 * cycles)) + (( uops_issued.any - uops_retired.retire_slots + 4 * int_misc.recovery_cycles ) / (4 * cycles)) + (uops_retired.retire_slots / (4 * cycles)) ) for Backend_Bound
found event uops_issued.any
found event cycles
found event idq_uops_not_delivered.core
found event int_misc.recovery_cycles
found event uops_retired.retire_slots
adding {cycles,uops_retired.retire_slots}:W,{uops_issued.any,cycles,idq_uops_not_delivered.core,int_misc.recovery_cycles,uops_retired.retire_slots}:W
uops_retired.retire_slots -> cpu/(null)=0x1e8483,umask=0x2,event=0xc2/
uops_issued.any -> cpu/(null)=0x1e8483,umask=0x1,event=0xe/
idq_uops_not_delivered.core -> cpu/(null)=0x1e8483,umask=0x1,event=0x9c/
int_misc.recovery_cycles -> cpu/(null)=0x1e8483,umask=0x1,event=0xd/
uops_retired.retire_slots -> cpu/(null)=0x1e8483,umask=0x2,event=0xc2/
Control descriptor is not initialized
cycles: 1648306 533003 533003
uops_retired.retire_slots: 1309840 533003 533003
uops_issued.any: 0 533003 0
cycles: 0 533003 0
idq_uops_not_delivered.core: 0 533003 0
int_misc.recovery_cycles: 0 533003 0
uops_retired.retire_slots: 0 533003 0

Performance counter stats for 'sleep 1':

1,648,306 cycles
# 0.20 Retiring
1,309,840 uops_retired.retire_slots
<not counted> uops_issued.any (0.00%)
<not counted> cycles (0.00%)
<not counted> idq_uops_not_delivered.core (0.00%)
<not counted> int_misc.recovery_cycles (0.00%)
<not counted> uops_retired.retire_slots (0.00%)

1.000942715 seconds time elapsed

0.000954000 seconds user
0.000000000 seconds sys

The events in group usually have to be from the same PMU. Try reorganizing the group.
john@localhost:~/acme/tools/perf>


Thanks for reporting the issues. I planned to look at this logic to
fix metric-no-group, it'd be nice to land:
https://lore.kernel.org/lkml/20210112230434.2631593-1-irogers@xxxxxxxxxx/
just so that I'm not making patch sets that conflict with myself.

As I said, one issue is caused by me, and I can send a fix. I need to test more, though. And I was holding off until an approach decided for 2nd issue. Since no resolution yet, I think I'll just send a fix today.

Thanks,
John