Re: [PATCH v2 3/3] perf report: Implement visual marker for macro fusion in annotate

From: Arnaldo Carvalho de Melo
Date: Mon Jun 19 2017 - 13:35:37 EST


Em Mon, Jun 19, 2017 at 10:55:58AM +0800, Jin Yao escreveu:
> For marking the fused instructions clearly, This patch adds a
> line before the first instruction of pair and joins it with the
> arrow of the jump.
>
> For example, when je is selected in annotate view, the line
> before cmpl is displayed and joins the arrow of je.
>
> â âââcmpl $0x0,argp_program_version_hook
> 81.93 â âââje 20
> â â lock cmpxchg %esi,0x38a9a4(%rip)
> â ââ jne 29
> â ââ jmp 43
> 11.47 â20:âââcmpxch %esi,0x38a999(%rip)

Ok, thanks for making this per-arch! Some comments:

I think we should have this marked permanently, i.e. not just when we go
to the jump line, something like this (testing here in a t450s
broadwell, function hc_find_func, /usr/lib64/liblzma.so.5.2.2):

It is like this now, when we are not on the jne jump line:

0.71 â mov %r14d,%r10d â
â movzbl (%rdx,%r10,1),%ebp â
1.06 â 70: mov (%r9,%rcx,4),%ecx â
77.98 â 74: cmp %bpl,(%rbx,%r10,1) â
â â jne 70 â
0.85 â movzbl (%rdx),%r10d â
0.99 â cmp %r10b,(%rbx) â

I think it should be augmented to:

0.71 â mov %r14d,%r10d â
â movzbl (%rdx,%r10,1),%ebp â
1.06 â 70: ââmov (%r9,%rcx,4),%ecx â
77.98 â 74: ââcmp %bpl,(%rbx,%r10,1) â
â â jne 70 â
0.85 â movzbl (%rdx),%r10d â
0.99 â cmp %r10b,(%rbx) â

I.e. no arrow, the two instructions that end up as one micro-op being
connected.

And then this:

â âââcmpl $0x0,argp_program_version_hook
81.93 â âââje 20
â â lock cmpxchg %esi,0x38a9a4(%rip)
â ââ jne 29
â ââ jmp 43
11.47 â20:âââcmpxch %esi,0x38a999(%rip)

Would look better as:

â âââcmpl $0x0,argp_program_version_hook
81.93 â âââje 20
â â lock cmpxchg %esi,0x38a9a4(%rip)
â ââ jne 29
â ââ jmp 43
11.47 â20:âââcmpxch %esi,0x38a999(%rip)

Patch below, please test/ack :-)

This was the low hanging fruit, having the:

1.06 â 70: ââmov (%r9,%rcx,4),%ecx â
77.98 â 74: ââcmp %bpl,(%rbx,%r10,1) â

Marker always there, not just when we have the cursor on top of one of
those lines remains to be coded.

But you state:

------------
Macro fusion merges two instructions to a single micro-op. Intel core
platform performs this hardware optimization under limited
circumstances.
------------

"Intel core", what about older arches, etc, don't you have to look at:

# cpudesc : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
# cpuid : GenuineIntel,6,61,4

present in the perf.data header (or in the running system, for things
like 'perf top') to make sure that this is a machine where such "macro
fusion" takes place?

- Arnaldo

diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c
index acba636bd165..9ef7677ae14f 100644
--- a/tools/perf/ui/browser.c
+++ b/tools/perf/ui/browser.c
@@ -756,8 +756,10 @@ void ui_browser__mark_fused(struct ui_browser *browser, unsigned int column,
ui_browser__gotorc(browser, end_row, column);
SLsmg_draw_hline(2);
ui_browser__gotorc(browser, end_row + 1, column - 1);
- SLsmg_draw_vline(1);
+ SLsmg_write_char(SLSMG_LTEE_CHAR);
} else {
+ ui_browser__gotorc(browser, end_row, column - 1);
+ SLsmg_write_char(SLSMG_LTEE_CHAR);
ui_browser__gotorc(browser, end_row, column);
SLsmg_draw_hline(2);
}