Re: [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load

From: Leo Yan
Date: Tue Oct 20 2020 - 04:18:52 EST


On Tue, Oct 20, 2020 at 05:13:01PM +0900, Namhyung Kim wrote:
> Hello,
>
> On Thu, Oct 15, 2020 at 11:51 PM Leo Yan <leo.yan@xxxxxxxxxx> wrote:
> >
> > If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
> > rely on HITM display to report cache false sharing. Alternatively, we
> > can use the LLC access and multi-threads info to locate the potential
> > false sharing's data address, and if we connect with source code and
> > analyze the multi-threads' execution timing, if can conclude load and
> > store the same cache line at the meantime, thus this can be helpful for
> > resolve the cache false sharing issue.
> >
> > This patch set is to enable the display with sorting on LLC load
> > accesses; it adds dimensions for total LLC hit and LLC load accesses,
> > and these dimensions are used for shared cache line table and pareto.
> >
> > This patch set is dependend on the patch set "perf c2c: Refine the
> > organization of metrics" [1].
> >
> > [1] https://lore.kernel.org/patchwork/cover/1321499/
> >
> > With this patch set, we can get display 'llc' as follows:
> >
> > # perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio
>
> I'm not sure if you ran the test on x86 or ARM.
> IIUC ARM should have 0 local hitm, right?

Yes, on Arm64 the local HITM and remote HITM both are zeros. Below is
the testing result on x86.

Thanks,
Leo

> > [...]
> >
> > =================================================
> > Shared Data Cache Line Table
> > =================================================
> > #
> > # ----------- Cacheline ---------- LLC Hit LLC Hit Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
> > # Index Address Node PA cnt Pct Total records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
> > # ..... .................. .... ...... ....... ........ ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
> > #
> > 0 0x563b01e83100 0 1401 65.32% 648 7011 3738 3273 2582 691 515 2516 59 143 505 0 0 0 0
> > 1 0x563b01e830c0 0 1 26.51% 263 400 400 0 0 0 130 3 4 262 1 0 0 0 0
> > 2 0x563b01e83080 0 1 7.76% 77 650 650 0 0 0 180 348 45 14 63 0 0 0 0
> > 3 0xffff88c3d74e82c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0
> > 4 0xffffa587c11e38c0 N/A 0 0.10% 1 2 1 1 1 0 0 0 0 1 0 0 0 0 0
> > 5 0xffffffffbd5e6fc0 0 1 0.10% 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0
> > 6 0x7f90a4d6c2c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0
> >
> > =================================================
> > Shared Cache Line Distribution Pareto
> > =================================================
> > #
> > # ---- LLC LD ---- -- Store Refs -- --------- Data address --------- ---------- cycles ---------- Total cpu Shared
> > # Num LclHit LclHitm L1 Hit L1 Miss Offset Node PA cnt Pid Tid Code address rmt hitm lcl hitm load records cnt Symbol Object Source:Line Node
> > # ..... ....... ....... ....... ....... .................. .... ...... ....... .................. .................. ........ ........ ........ ....... ........ ................... ................. ........................... ....
> > #
> > -------------------------------------------------------------
> > 0 143 505 2582 691 0x563b01e83100
> > -------------------------------------------------------------
> > 96.50% 7.72% 46.79% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c16 0 1949 1331 1876 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
> > 0.00% 35.05% 0.00% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c1d 0 2651 975 748 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> > 0.00% 30.89% 0.00% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c1d 0 1425 1003 762 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> > 2.10% 7.52% 49.19% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c16 0 1585 1053 2037 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
> > 0.00% 0.00% 2.52% 44.86% 0x0 0 1 14100 14102:lock_th 0x563b01c81c28 0 0 0 375 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> > 0.00% 0.00% 1.51% 55.14% 0x0 0 1 14100 14103:lock_th 0x563b01c81c28 0 0 0 420 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> > 1.40% 12.87% 0.00% 0.00% 0x20 0 1 14100 14104:reader_thd 0x563b01c81c73 0 166 99 417 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
> > 0.00% 5.94% 0.00% 0.00% 0x20 0 1 14100 14105:reader_thd 0x563b01c81c73 0 144 85 376 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
> >
> > [...]
> >
> >
> > Leo Yan (8):
> > perf mem: Add structure field c2c_stats::tot_llchit
> > perf c2c: Add dimensions for total LLC hit
> > perf c2c: Add dimensions for LLC load hit
> > perf c2c: Change to general naming for macros
> > perf c2c: Rename for shared cache line stats
> > perf c2c: Refactor hist entry validation
> > perf c2c: Add option '-d llc' for sorting with LLC load
> > perf c2c: Update documentation for display option 'llc'
> >
> > tools/perf/Documentation/perf-c2c.txt | 18 +-
> > tools/perf/builtin-c2c.c | 333 +++++++++++++++++++++-----
> > tools/perf/util/mem-events.c | 3 +
> > tools/perf/util/mem-events.h | 1 +
> > 4 files changed, 286 insertions(+), 69 deletions(-)
> >
> > --
> > 2.17.1
> >