Re: [PATCH v5 2/5] perf: Add SNOOP_PEER flag to perf mem data struct

From: Liang, Kan
Date: Wed Apr 27 2022 - 15:30:11 EST




On 4/27/2022 12:19 PM, Leo Yan wrote:
Hi Kan,

On Mon, Apr 25, 2022 at 01:01:40PM -0400, Liang, Kan wrote:


On 4/24/2022 7:43 AM, Leo Yan wrote:
On Sat, Apr 23, 2022 at 05:53:28AM -0700, Andi Kleen wrote:

Except SNOOPX_FWD means a no modified cache snooping, it also means it's
a cache conherency from *remote* socket. This is quite different from we
define SNOOPX_PEER, which only snoop from peer CPU or clusters.


The FWD doesn't have to be *remote*. The definition you quoted is just for
the "L3 Miss", which is indeed a remote forward. But we still have
cross-core FWD. See Table 19-101.

Actually, X86 uses the PERF_MEM_REMOTE_REMOTE + PERF_MEM_SNOOPX_FWD to
indicate the remote FWD, not just SNOOPX_FWD.

Thanks a lot for the info.

If no objection, I prefer we could keep the new snoop type SNOOPX_PEER,
this would be easier for us to distinguish the semantics and support the
statistics for SNOOPX_FWD and SNOOPX_PEER separately.

I overlooked the flag SNOOPX_FWD, thanks a lot for Kan's reminding.

Yes seems better to keep using a separate flag if they don't exactly match.


Yes, I agree with Andi. If you still think the existing flag combination
doesn't match your requirement, a new separate flag should be introduced.
I'm not familiar with ARM. I think I will leave it to you and the maintainer
to decide.

It's a bit difficult for me to make decision is because now SNOOPX_FWD
is not used in the file util/mem-events.c, so I am not very sure if
SNOOPX_FWD has the consistent usage across different arches.

No, it's used in the file util/mem-events.c
See perf_mem__snp_scnprintf().


On the other hand, I sent a patch for 'peer' flag statistics [1], you
could review it and it only stats for L2 and L3 cache level for local
node.

If it's for the local node, why don't you use the hop level which is introduced recently by Power? The below seems a good fit.

PERF_MEM_LVLNUM_ANY_CACHE | PERF_MEM_HOPS_0?

/* hop level */
#define PERF_MEM_HOPS_0 0x01 /* remote core, same node */
#define PERF_MEM_HOPS_1 0x02 /* remote node, same socket */
#define PERF_MEM_HOPS_2 0x03 /* remote socket, same board */
#define PERF_MEM_HOPS_3 0x04 /* remote board */
/* 5-7 available */
#define PERF_MEM_HOPS_SHIFT 43

Thanks,
Kan


The main purpose for my sending this email is if you think the FWD can
be the consistent for both arches, and even the new added display mode
is also useful for x86 arch (we can rename it as 'fwd' display mode),
then I am very glad to unify the flag.

Thanks,
Leo

[1] https://lore.kernel.org/lkml/20220427155013.1833222-5-leo.yan@xxxxxxxxxx/