Re: [PATCH] perf arm-spe: Use SPE data source for neoverse cores

From: Ali Saidi
Date: Fri Feb 04 2022 - 19:07:35 EST


Hi Leo,

On 2/3/22, 3:20 AM, "Leo Yan" <leo.yan@xxxxxxxxxx> wrote:
>[...]
>> >> >> + data_src.mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
>> >> >
>> >> >This one also adds PERF_MEM_LVL_HIT even though the check of "if (record->type & ARM_SPE_LLC_MISS)"
>> >> >hasn't happened yet. Maybe some comments would make it a bit clearer, but at the moment it's
>> >> >not obvious how the result is derived because there are some things that don't add up like
>> >> >ARM_SPE_LLC_MISS == PERF_MEM_LVL_HIT.
>> >>
>> >> Assuming the above is correct, my reading of the existing code that creates the
>> >> c2c output is that when an access is marked as an LLC hit, that doesn't
>> >> necessarily mean that the data was present in the LLC. I don't see how it could
>> >> given that LLC_HIT + HITM means the line was dirty in another CPUs cache, and so
>> >> LLC_HIT + HITM seems to mean that it was a hit in the LLC snoop filter and
>> >> required a different core to provide the line. This and the above certainly
>> >> deserve a comment as to why the miss is being attributed in this way if it's
>> >> otherwise acceptable.
>> >
>> >As James pointed out, this might introduce confusion. I am wanderding
>> >if we can extract two functions for synthesizing the data source, one is
>> >for Neoverse platform and another is for generic purpose (which
>> >without data source packets), below code is to demonstrate the basic
>> >idea.
>>
>> The code below is cleaner, and I'm happy to rework the patches in this way, but
>> I think the question still remains about unifying behavior of the tool. If we
>> mark something with a data source of ARM_SPE_NV_PEER_CORE as at L1 hit + HITM
>> certainly c2c won't show the correct thing today, but i think it also hides the
>> intent. The line in question missed the L1, L2, and got to the LLC where we did
>> find a record that it was in another cores cache (L1 or L2). Looking at the way
>> that c2c works today, it seems like marking this as a hit in the LLC snoop
>> filter is the best way to unify behaviors among architectures?
>
>Thanks a lot for pointing out this. I looked into the code and
>compared the memory trace data from x86, I found the HITM tag is always
>sticking to LLC from x86's memory events. This would be the main reason
>why current code in perf is only support HITM for LLC.
>
>I don't think it's a good way to always mark LLC snoop, even it's a
>snooping operation in L1 or L2 cache on Arm64 platforms; this would
>introduce confusion for users when using Arm SPE for profiling.
>
>Alternatively, we can support HITM tag for L1/L2 cache levels in perf,
>this can allow us to match memory topology on Arm64 arch, and it should
>not introduce any regression on x86 arch.
>
>Could you confirm if below code can fix the issue or not?

Yes, that should do it. Want me to repsin this with the changes we discussed?

Ali