Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED!

From: Liang, Kan
Date: Wed Feb 01 2023 - 14:07:06 EST




On 2023-02-01 12:02 p.m., Ian Rogers wrote:
> On Wed, Feb 1, 2023 at 7:28 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>>
>> Hi Ian,
>>
>> On 2023-01-30 10:55 p.m., Ian Rogers wrote:
>>>>> There's a question about what we should do in the perf test about
>>>>> this? I have a few solutions:
>>>>>
>>>>> 1) try metric tests again with the --metric-no-group flag and don't
>>>>> fail the test if this succeeds. This allows kernel bugs to hide, so
>>>>> I'm not a huge fan.
>>>>>
>>>>> 2) add a new metric flag/constraint to say not to group, this way the
>>>>> metric will automatically apply the "--metric-no-group" flag. It is a
>>>>> bit of work to wire this up but this kind of failure is common enough
>>>>> in PMUs that it is probably worthwhile. We also need to add the flag
>>>>> to metrics and I'm not sure how to get a good list of the metrics that
>>>>> currently fail and require it. This is okay but error prone.
>>>>>
>>>>> 3) fix the kernel bug and let the perf test fail until an adequate
>>>>> kernel is installed. Probably the best option.
>>>>>
>>>> Hi Ian,
>>>>
>>>> I can confirm:
>>>>
>>>> $ echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
>>>> /proc/sys/kernel/perf_event_paranoid
>>>> 0
>>>>
>>>> $ ~/bin/perf stat -M tma_l3_bound --metric-no-group -a sleep 1
>>>>
>>>> Performance counter stats for 'system wide':
>>>>
>>>> 2.058.892 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,5 %
>>>> tma_l3_bound (99,30%)
>>>> 173.254.697 CYCLE_ACTIVITY.STALLS_L2_PENDING
>>>> (99,10%)
>>>> 2.396.130.501 CPU_CLK_UNHALTED.THREAD
>>>> (99,60%)
>>>> 1.110.486 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS
>>>> (99,53%)
>>>>
>>>> 1,001989022 seconds time elapsed
>>>>
>>>> $ ~/bin/perf stat -M tma_dram_bound --metric-no-group -a sleep 1
>>>>
>>>> Performance counter stats for 'system wide':
>>>>
>>>> 1.729.208 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,2 %
>>>> tma_dram_bound (99,50%)
>>>> 50.346.734 CYCLE_ACTIVITY.STALLS_L2_PENDING
>>>> (99,50%)
>>>> 2.354.963.862 CPU_CLK_UNHALTED.THREAD
>>>> (99,80%)
>>>> 306.500 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS
>>>> (99,61%)
>>>>
>>>> 1,001981392 seconds time elapsed
>>>>
>>>> Thanks!
>>> Thanks, apparently it is an issue with SandyBridge/IvyBridge that some
>>> counters on one hyperthread will limit what can be on the other. I
>>> believe that's the comment related to EXCL access here:
>>> https://github.com/torvalds/linux/blob/master/arch/x86/events/intel/core.c#L124
>>> So you may have more success with the metric if you disable
>>> hyperthreading, but I imagine that's not a popular option.
>>
>> Thanks for debugging the issue. Yes, it's caused by the HT workaround
>> for SNB/IVB/HSW.
>>
>> The weak group check in the kernel is in validate_group(). It only does
>> a sanity check. It doesn't check all the workarounds and the current
>> status of counters (e.g., whether the fixed counter is occupied by NMI
>> watchdog.) It's possible that a false positive is returned to the perf
>> tool. I once tried to fix the NMI watchdog check in the kernel, but the
>> proposal was rejected. So the metric constraint is introduced.
>>
>> For this issue, I think the above option2 should be a better and
>> practical choice. The issue is only observed on old machines, which
>> usually has a stable kernel running on it. I don't think the user wants
>> to update their kernel just to workaround an issue for several metrics.
>> But it should be much easier for them to update the perf tool.
>>
>> We know that the below events are the problematic events.
>> /* MEM_UOPS_RETIRED.* */
>> /* MEM_LOAD_UOPS_RETIRED.* */
>> /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
>> /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
>> Can we update the convertor script and apply the "--metric-no-group"
>> flag or add a new constraint if the above events are detected in
>> SNB/IVB/HSW?
>>
>> Thanks,
>> Kan
>
> Thanks Kan,
>
> We absolutely can do that! In this case should it be --metric-no-group
> only when SMT is enabled? I can do some patches but would like to know
> about whether we need SMT and not SMT versions of --metric-no-group.

The kernel workaround is disabled when SMT is off. So I think we only
need SMT version of --metric-no-group.
https://lore.kernel.org/all/1416251225-17721-13-git-send-email-eranian@xxxxxxxxxx/T/#u

> Also, should we just have a list of metrics that need the flag or try
> to automate detection?

I don't think Intel will update the metrics or events for the old
SNB/IVB/HSW platforms. Hard code a list of metrics may be simpler than
automated detection.

> Some warts in detection are the names of the
> events that vary between Ivybridge and Sandybridge, and how to
> determine which events conflict. For example, the perfmon event data:
>
> MEM_LOAD_UOPS_RETIRED.LLC_HIT
> https://github.com/intel/perfmon/blob/main/IVB/events/ivybridge_core.json#L5368
> MEM_LOAD_UOPS_RETIRED.LLC_MISS
> https://github.com/intel/perfmon/blob/main/IVB/events/ivybridge_core.json#L5431
> CYCLE_ACTIVITY.STALLS_L2_PENDING
> https://github.com/intel/perfmon/blob/main/IVB/events/ivybridge_core.json#L3541
>

The problematic events should have the same name among platforms. If the
event name doesn't work, the event encoding is exactly the same among
those platforms.


> The events list all counters, there are no errata fields.. Should the
> event data be updated and then in the converter script handle that? If
> I get shown an example I can modify the script accordingly.

If it can helps the converter script, I think we can update the errata
field.

Here are the errata information.
* SNB: BJ122
* IVB: BV98
* HSW: HSD29

Here is the details regarding the issue. (Please search BV98)
https://www.intel.com/content/www/us/en/content-details/619604/desktop-3rd-generation-intel-core-processor-family-specification-update.html
>
> It is also hard for me to test anything other than SMT on Ivybridge.
>

I think it's OK to only test on Ivybridge.
The original kernel patch indicates the issue is the same among SNB, IVB
and HSW.
https://lore.kernel.org/all/1416251225-17721-7-git-send-email-eranian@xxxxxxxxxx/T/#u

Thanks,
Kan