Re: [BUG] perf/x86/intel/pebs: PEBS timestamps overwritten

From: Ravi Bangoria
Date: Mon Aug 08 2022 - 05:29:42 EST


On 05-Aug-22 7:06 PM, Liang, Kan wrote:
>
>
> On 2022-08-05 6:49 a.m., Stephane Eranian wrote:
>> Hi,
>>
>> I was alerted by an internal user that the PEBS TSC-based timestamps
>> do not appear
>> correctly in the final perf.data output file from perf record.
>>
>> After some investigation, I came to the conclusion that indeed the
>> data->time field setup
>> by PEBS in the setup_pebs_fixed_sample_data() is later overwritten by
>> perf_events generic
>> code in perf_prepare_sample(). There is an ordering problem here.
>>
>> Looking around we found that this problem had been uncovered back in
>> May 2020 and a
>> patch had been posted then:
>> https://lore.kernel.org/lkml/e754b625-bf14-8f5f-bd1a-71d774057005@xxxxxxxxx/T/
>>
>> However this patch was never commented upon or committed.
>>
>> The problem is still present in the upstream code today.
>>
>> 1. perf_sample_data_init()
>> 2. setup_pebs_fixed_sample_data(): data->time =
>> native_sched_clock_from_tsc(pebs->tsc);
>> 3. perf_prepare_sample(): data->time = perf_event_clock(event);
>>
>> The patch from 2020 (Andreas Kogler) fixes the problem by making the
>> assignment in 3.
>> conditioned to the value of data->time being 0. Andreas also suggested
>> an alternative which
>> would break up the call to perf_event_ouput() like this is done in the
>> BTS code allowing
>> the prepare_sample() call to be made before PEBS samples are
>> extracted. That would
>> generate some code duplication. Although this approach appears more
>> robust, the one
>> issue I see is that prepare_sample may need data that would be filled
>> by PEBS and
>> therefore it would need to be called afterwards.
>>
>> Any better ideas?
>
> I think Andreas's patch is the most straightforward and simplest patch
> to fix the issue. But, if I recall correctly, Peter prefers to minimize
> the cachelines touched by the perf_sample_data_init(). So initializing
> the data->time in the perf_sample_data_init() seems break the rule.
>
> I think HW will provide more and more such kind of precise information.
> Maybe we can use a flag variable to track whether the information is
> already provided to avoid the overwritten.

fwiw, we had similar quirks in the past. For ex: __PERF_SAMPLE_CALLCHAIN_EARLY

Thanks,
Ravi