Re: [PATCH 00/25] perf intel-pt: Add support for Event Trace and TNT disable

From: Arnaldo Carvalho de Melo
Date: Wed Feb 16 2022 - 04:46:36 EST


Em Wed, Feb 16, 2022 at 08:12:08AM +0200, Adrian Hunter escreveu:
> On 15/02/2022 21:39, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Feb 15, 2022 at 11:27:34AM -0300, Arnaldo Carvalho de Melo escreveu:
> >> Em Mon, Jan 24, 2022 at 10:41:36AM +0200, Adrian Hunter escreveu:
> >>> Hi
> >>>
> >>> Brief documentation is in the last patch with more technical details in
> >>> Intel SDM (https://www.intel.com/sdm) version 076.
> >>>
> >>> The patches touch mostly Intel PT and auxtrace code, with a few supporting
> >>> changes to perf script.
> >>>
> >>> I am not aware of any hardware that has the new capabilities yet.
> >>>
> >>> Patches can also be found here:
> >>>
> >>> https://github.com/ahunter6/linux/tree/event-trace
> >>>
> >>
> >> Thanks, applied.
> >
> > So there were some problems in some compilers:
> >
> > [perfbuilder@five ~]$ grep FAIL dm.log/summary
> > 1 51.67 almalinux:8 : FAIL clang version 12.0.1 (Red Hat 12.0.1-4.module_el8.5.0+1025+93159d6c)
> > 2 28.48 alpine:3.4 : FAIL clang version 3.8.0 (tags/RELEASE_380/final)
> > 3 28.89 alpine:3.5 : FAIL clang version 3.8.1 (tags/RELEASE_381/final)
> > 4 30.20 alpine:3.6 : FAIL clang version 4.0.0 (tags/RELEASE_400/final)
> > 5 31.39 alpine:3.7 : FAIL gcc version 6.4.0 (Alpine 6.4.0)
> > 6 31.59 alpine:3.8 : FAIL gcc version 6.4.0 (Alpine 6.4.0)
> > 7 32.90 alpine:3.9 : FAIL gcc version 8.3.0 (Alpine 8.3.0)
> > 8 38.92 alpine:3.10 : FAIL gcc version 8.3.0 (Alpine 8.3.0)
> > 9 43.02 alpine:3.11 : FAIL gcc version 9.3.0 (Alpine 9.3.0)
> > 10 55.58 alpine:3.12 : FAIL gcc version 9.3.0 (Alpine 9.3.0)
> > 11 59.70 alpine:3.13 : FAIL gcc version 10.2.1 20201203 (Alpine 10.2.1_pre1)
> > 12 56.18 alpine:3.14 : FAIL gcc version 10.3.1 20210424 (Alpine 10.3.1_git20210424)
> > 13 55.98 alpine:3.15 : FAIL gcc version 10.3.1 20211027 (Alpine 10.3.1_git20211027)
> > 14 57.38 alpine:edge : FAIL gcc version 11.2.1 20211128 (Alpine 11.2.1_git20211128)
> > 15 26.99 alt:p8 : FAIL clang version 3.8.0 (tags/RELEASE_380/final)
> > 16 44.66 alt:p9 : FAIL clang version 10.0.0
> > 17 44.95 alt:p10 : FAIL clang version 11.0.1
> > 18 44.74 alt:sisyphus : FAIL gcc version 11.2.1 20211202 (ALT Sisyphus 11.2.1-alt2) (GCC)
> > 19 27.89 amazonlinux:1 : FAIL clang version 3.6.2 (tags/RELEASE_362/final)
> > 20 45.55 amazonlinux:2 : FAIL clang version 11.1.0 (Amazon Linux 2 11.1.0-1.amzn2.0.2)
> > 21 35.92 archlinux:base : FAIL clang version 13.0.0
> > 22 37.83 centos:8 : FAIL clang version 11.0.1 (Red Hat 11.0.1-1.module_el8.4.0+966+2995ef20)
> > 23 51.37 centos:stream : FAIL clang version 12.0.1 (Red Hat 12.0.1-2.module_el8.6.0+937+1cafe22c)
> > 25 30.89 debian:9 : FAIL clang version 3.8.1-24 (tags/RELEASE_381/final)
> > 26 29.59 debian:10 : FAIL clang version 7.0.1-8+deb10u2 (tags/RELEASE_701/final)
> > 27 41.13 debian:11 : FAIL gcc version 10.2.1 20210110 (Debian 10.2.1-6)
> > 28 47.06 debian:experimental : FAIL gcc version 11.2.0 (Debian 11.2.0-14)
> > 34 29.28 fedora:23 : FAIL clang version 3.7.0 (tags/RELEASE_370/final)
> > 35 38.12 fedora:24 : FAIL clang version 3.8.1 (tags/RELEASE_381/final)
> > 37 33.10 fedora:25 : FAIL clang version 3.9.1 (tags/RELEASE_391/final)
> > 38 40.72 fedora:26 : FAIL clang version 4.0.1 (tags/RELEASE_401/final)
> > 39 35.32 fedora:27 : FAIL clang version 5.0.2 (tags/RELEASE_502/final)
> > 40 38.51 fedora:28 : FAIL clang version 6.0.1 (tags/RELEASE_601/final)
> > 41 40.61 fedora:29 : FAIL clang version 7.0.1 (Fedora 7.0.1-6.fc29)
> > 42 41.63 fedora:30 : FAIL clang version 8.0.0 (Fedora 8.0.0-3.fc30)
> > 43 35.51 fedora:31 : FAIL clang version 9.0.1 (Fedora 9.0.1-4.fc31)
> > 44 45.26 fedora:32 : FAIL clang version 10.0.1 (Fedora 10.0.1-3.fc32)
> > 45 44.66 fedora:33 : FAIL clang version 11.0.0 (Fedora 11.0.0-3.fc33)
> > 46 46.36 fedora:34 : FAIL clang version 12.0.1 (Fedora 12.0.1-1.fc34)
> > 49 49.97 fedora:35 : FAIL clang version 13.0.0 (Fedora 13.0.0-3.fc35)
> > 50 6.78 fedora:rawhide : FAIL gcc version 12.0.1 20220205 (Red Hat 12.0.1-0) (GCC)
> > 51 44.84 gentoo-stage3:latest : FAIL clang version 13.0.0
> > 52 32.99 mageia:6 : FAIL clang version 3.9.1 (tags/RELEASE_391/final)
> > 53 39.01 mageia:7 : FAIL clang version 8.0.0 (Mageia 8.0.0-1.mga7)
> > 54 47.66 manjaro:base : FAIL clang version 13.0.0
> > 55 6.49 openmandriva:cooker : FAIL gcc version 11.2.0 20210728 (OpenMandriva) (GCC)
> > 56 46.54 opensuse:15.0 : FAIL clang version 5.0.1 (tags/RELEASE_501/final 312548)
> > 57 48.74 opensuse:15.1 : FAIL clang version 7.0.1 (tags/RELEASE_701/final 349238)
> > 58 54.96 opensuse:15.2 : FAIL clang version 9.0.1
> > 59 67.79 opensuse:15.3 : FAIL clang version 11.0.1
> > 60 67.69 opensuse:15.4 : FAIL clang version 11.0.1
> > 61 75.25 opensuse:tumbleweed : FAIL clang version 13.0.0
> > 62 50.15 oraclelinux:8 : FAIL clang version 12.0.1 (Red Hat 12.0.1-4.0.1.module+el8.5.0+20428+2b4ecd47)
> > 63 56.68 rockylinux:8 : FAIL clang version 12.0.1 (Red Hat 12.0.1-4.module+el8.5.0+715+58f51d49)
> > 64 38.11 ubuntu:16.04 : FAIL clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
> > 71 38.11 ubuntu:18.04 : FAIL clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
> > 82 41.43 ubuntu:20.04 : FAIL clang version 10.0.0-4ubuntu1
> > 84 30.21 ubuntu:20.10 : FAIL gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1~20.10)
> > 85 39.23 ubuntu:21.04 : FAIL gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1)
> > 86 1.07 ubuntu:21.04-x-mips64 : FAIL gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1)
> > 87 40.23 ubuntu:21.10 : FAIL gcc version 11.2.0 (Ubuntu 11.2.0-7ubuntu2)
> > 88 52.77 ubuntu:22.04 : FAIL gcc version 11.2.0 (Ubuntu 11.2.0-14ubuntu1)
> > [perfbuilder@five ~]$
> >
> > :-)
> >
> > Sometimes its clang, sometimes its gcc, but mostly is this:
> >
> > [perfbuilder@five ~]$ tail -25 dm.log/ubuntu\:22.04
> > LD /tmp/build/perf/util/arm-spe-decoder/perf-in.o
> > CC /tmp/build/perf/util/clockid.o
> > CC /tmp/build/perf/util/list_sort.o
> > CC /tmp/build/perf/util/bpf-loader.o
> > util/intel-pt.c:2213:31: error: field 'cfe' with variable sized type 'struct perf_synth_intel_evt' not at the end of a struct or class is a GNU extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
> > struct perf_synth_intel_evt cfe;
> > ^
> > CC /tmp/build/perf/util/bpf_map.o
> > CC /tmp/build/perf/util/bpf-prologue.o
> > CC /tmp/build/perf/util/symbol-elf.o
> > 1 error generated.
> > CC /tmp/build/perf/util/probe-file.o
> > make[4]: *** [/git/perf-5.17.0-rc3/tools/build/Makefile.build:97: /tmp/build/perf/util/intel-pt.o] Error 1
> > make[4]: *** Waiting for unfinished jobs....
> > LD /tmp/build/perf/libbpf/staticobjs/libbpf-in.o
> > LINK /tmp/build/perf/libbpf/libbpf.a
> > LD /tmp/build/perf/util/scripting-engines/perf-in.o
> > LD /tmp/build/perf/util/intel-pt-decoder/perf-in.o
> > make[3]: *** [/git/perf-5.17.0-rc3/tools/build/Makefile.build:139: util] Error 2
> > make[2]: *** [Makefile.perf:665: /tmp/build/perf/perf-in.o] Error 2
> > make[2]: *** Waiting for unfinished jobs....
> > make[1]: *** [Makefile.perf:240: sub-make] Error 2
> > make: *** [Makefile:70: all] Error 2
> > make: Leaving directory '/git/perf-5.17.0-rc3/tools/perf'
> > + exit 1
> > [perfbuilder@five ~]$[perfbuilder@five ~]$ tail -25 dm.log/ubuntu\:22.04
> > LD /tmp/build/perf/util/arm-spe-decoder/perf-in.o
> > CC /tmp/build/perf/util/clockid.o
> > CC /tmp/build/perf/util/list_sort.o
> > CC /tmp/build/perf/util/bpf-loader.o
> > util/intel-pt.c:2213:31: error: field 'cfe' with variable sized type 'struct perf_synth_intel_evt' not at the end of a struct or class is a GNU extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
> > struct perf_synth_intel_evt cfe;
> > ^
> > CC /tmp/build/perf/util/bpf_map.o
> > CC /tmp/build/perf/util/bpf-prologue.o
> > CC /tmp/build/perf/util/symbol-elf.o
> > 1 error generated.
> > CC /tmp/build/perf/util/probe-file.o
> > make[4]: *** [/git/perf-5.17.0-rc3/tools/build/Makefile.build:97: /tmp/build/perf/util/intel-pt.o] Error 1
> > make[4]: *** Waiting for unfinished jobs....
> > LD /tmp/build/perf/libbpf/staticobjs/libbpf-in.o
> > LINK /tmp/build/perf/libbpf/libbpf.a
> > LD /tmp/build/perf/util/scripting-engines/perf-in.o
> > LD /tmp/build/perf/util/intel-pt-decoder/perf-in.o
> > make[3]: *** [/git/perf-5.17.0-rc3/tools/build/Makefile.build:139: util] Error 2
> > make[2]: *** [Makefile.perf:665: /tmp/build/perf/perf-in.o] Error 2
> > make[2]: *** Waiting for unfinished jobs....
> > make[1]: *** [Makefile.perf:240: sub-make] Error 2
> > make: *** [Makefile:70: all] Error 2
> > make: Leaving directory '/git/perf-5.17.0-rc3/tools/perf'
> > + exit 1
> > [perfbuilder@five ~]$
> >
> > I'm trying to fix this now, lets see...
>
> What about this?
>
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index 320cbcd5b902..73a414643010 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -295,6 +295,12 @@ struct perf_synth_intel_evd {
> u64 payload;
> };
>
> +/*
> + * Maximum number of event trace data in one go, assuming at most 1 per type
> + * and 6-bits of type in the EVD packet.
> + */
> +#define INTEL_PT_MAX_EVDS 64
> +

I ended up making it:

struct perf_synth_intel_evd evd[0];

And that made all compilers happy, but since there is an effort to
remove this in the kernel sources, I'll apply your patch on top of
everything, as I already pushed it out to my kernel.org perf/core
branch.

Thanks!

- Arnaldo

> /* Intel PT Event Trace */
> struct perf_synth_intel_evt {
> u32 padding;
> @@ -308,7 +314,7 @@ struct perf_synth_intel_evt {
> };
> u32 cfe;
> };
> - struct perf_synth_intel_evd evd[];
> + struct perf_synth_intel_evd evd[INTEL_PT_MAX_EVDS];
> };
>
> struct perf_synth_intel_iflag_chg {
> diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
> index e1d8f7504cbe..44822d041199 100644
> --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
> +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
> @@ -18,6 +18,7 @@
> #include <linux/zalloc.h>
>
> #include "../auxtrace.h"
> +#include "../event.h"
>
> #include "intel-pt-insn-decoder.h"
> #include "intel-pt-pkt-decoder.h"
> diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
> index efb2cb3ae0ca..fec832a7cf75 100644
> --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
> +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
> @@ -212,12 +212,6 @@ struct intel_pt_vmcs_info {
> bool error_printed;
> };
>
> -/*
> - * Maximum number of event trace data in one go, assuming at most 1 per type
> - * and 6-bits of type in the EVD packet.
> - */
> -#define INTEL_PT_MAX_EVDS 64
> -
> /* Event trace data from EVD packet */
> struct intel_pt_evd {
> int type;
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index 6fa76b584537..a00ca10f11a6 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -2211,10 +2211,7 @@ static int intel_pt_synth_events_sample(struct intel_pt_queue *ptq)
> struct intel_pt *pt = ptq->pt;
> union perf_event *event = ptq->event_buf;
> struct perf_sample sample = { .ip = 0, };
> - struct {
> - struct perf_synth_intel_evt cfe;
> - struct perf_synth_intel_evd evd[INTEL_PT_MAX_EVDS];
> - } raw;
> + struct perf_synth_intel_evt raw;
> int i;
>
> if (intel_pt_skip_event(pt))
> @@ -2225,11 +2222,11 @@ static int intel_pt_synth_events_sample(struct intel_pt_queue *ptq)
> sample.id = ptq->pt->evt_id;
> sample.stream_id = ptq->pt->evt_id;
>
> - raw.cfe.type = ptq->state->cfe_type;
> - raw.cfe.reserved = 0;
> - raw.cfe.ip = !!(ptq->state->flags & INTEL_PT_FUP_IP);
> - raw.cfe.vector = ptq->state->cfe_vector;
> - raw.cfe.evd_cnt = ptq->state->evd_cnt;
> + raw.type = ptq->state->cfe_type;
> + raw.reserved = 0;
> + raw.ip = !!(ptq->state->flags & INTEL_PT_FUP_IP);
> + raw.vector = ptq->state->cfe_vector;
> + raw.evd_cnt = ptq->state->evd_cnt;
>
> for (i = 0; i < ptq->state->evd_cnt; i++) {
> raw.evd[i].et = 0;
>
>
>
> >
> > - Arnaldo
> >
> >
> >>
> >>> Adrian Hunter (25):
> >>> perf intel-pt: pkt-decoder-test: Fix scope of test_data
> >>> perf intel-pt: pkt-decoder: Remove misplaced linebreak
> >>> perf intel-pt: pkt-decoder: Add CFE and EVD packets
> >>> perf intel-pt: pkt-decoder: Add MODE.Exec IFLAG bit
> >>> perf intel-pt: decoder: Add config bit definitions
> >>> perf intel-pt: decoder: Factor out clearing of FUP event variables
> >>> perf intel-pt: decoder: Add CFE and EVD processing
> >>> perf intel-pt: decoder: Add MODE.Exec IFLAG processing
> >>> perf tools: Define Intel PT CFE / EVD event
> >>> perf tools: Define Intel PT iflag synthesized event
> >>> perf tools: Define new D and t flags
> >>> perf auxtrace: Add itrace option "I"
> >>> perf intel-pt: Record Event Trace capability flag
> >>> perf intel-pt: Synthesize CFE / EVD event
> >>> perf intel-pt: Synthesize iflag event
> >>> perf intel-pt: Synthesize new D and t flags
> >>> perf intel-pt: Force 'quick' mode when TNT is disabled
> >>> perf script: Display Intel PT CFE / EVD synthesized event
> >>> perf script: Display Intel PT iflag synthesized event
> >>> perf script: Display new D and t flags
> >>> perf scripts python: intel-pt-events.py: Add Event Trace
> >>> perf scripting python: Add all sample flags to DB export
> >>> perf scripts python: export-to-sqlite.py: Export all sample flags
> >>> perf scripts python: export-to-postgresql.py: Export all sample flags
> >>> perf intel-pt: Add documentation for Event Trace and TNT disable
> >>>
> >>> tools/perf/Documentation/itrace.txt | 2 +
> >>> tools/perf/Documentation/perf-intel-pt.txt | 104 ++++++++-
> >>> tools/perf/Documentation/perf-script.txt | 13 +-
> >>> .../arch/x86/tests/intel-pt-pkt-decoder-test.c | 17 +-
> >>> tools/perf/arch/x86/util/intel-pt.c | 7 +
> >>> tools/perf/builtin-script.c | 78 ++++++-
> >>> tools/perf/scripts/python/export-to-postgresql.py | 17 +-
> >>> tools/perf/scripts/python/export-to-sqlite.py | 19 +-
> >>> tools/perf/scripts/python/intel-pt-events.py | 55 ++++-
> >>> tools/perf/util/auxtrace.c | 4 +
> >>> tools/perf/util/auxtrace.h | 4 +
> >>> tools/perf/util/event.h | 45 +++-
> >>> .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 245 +++++++++++++++++++--
> >>> .../perf/util/intel-pt-decoder/intel-pt-decoder.h | 21 ++
> >>> .../util/intel-pt-decoder/intel-pt-pkt-decoder.c | 47 +++-
> >>> .../util/intel-pt-decoder/intel-pt-pkt-decoder.h | 3 +
> >>> tools/perf/util/intel-pt.c | 164 +++++++++++++-
> >>> .../util/scripting-engines/trace-event-python.c | 3 +-
> >>> 18 files changed, 779 insertions(+), 69 deletions(-)
> >>>
> >>>
> >>> Regards
> >>> Adrian
> >>
> >> --
> >>
> >> - Arnaldo
> >

--

- Arnaldo