Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events

From: Leo Yan
Date: Mon Oct 18 2021 - 09:23:42 EST


Hi German,

On Mon, Oct 18, 2021 at 12:01:27PM +0100, German Gomez wrote:
> Hi,
>
> What do you thing of the patch below? PERF_RECORD_SWITCH events are also
> included for tracing forks. The patch would sit on top of Namhyung's.

Yeah, it's good to add PERF_RECORD_SWITCH.

> On 12/10/2021 12:07, German Gomez wrote:
> > Hi, Leo and Namhyung,
> >
> > I want to make sure I'm on the same page as you regarding this topic.
> >
> > [...]
> >
> > If we are not considering patching the driver at this stage, so we allow
> > hardware tracing on non-root namespaces. I think we could proceed like
> > this:
> >
> >   - For userspace, always use context-switch events as they are
> >     accurate and consistent with namespaces.

I don't think you can always use context-switch events for userspace
samples. The underlying mechanism is when there have context-switch
event or context packet is coming, it will invoke the function
machine__set_current_tid() to set current pid/tid; afterwards, we
can retrieve the current pid/tid with the function
arm_spe_set_pid_tid_cpu().

The question is that if we want to use the tid/pid info at the same
time for both context-switch events and context packets, then it's
hard to maintain. E.g. we need to create multiple thread context, one
is used to track pid info coming from context-switch events and
another context is to track pid info from context packet.

To simplify the code, I still think we give context packet priority and
use it if it's avalible. And we rollback to use context-switch events
for pid/tid when context packet is not avaliable.

> >   - For kernel tracing, if context packets are enabled, use them, but
> >     warn the user that the PIDs correspond to the root namespace.
> >   - Otherwise, use context-switch events and warn the user of the time
> >     inaccuracies.
> >
> > Later, if the driver is patched to disable context packets outside the
> > root namespace, kernel tracing could fall back to using context-switch
> > events and warn the user with a single message about the time
> > inaccuracies.
> >
> > If we are aligned, we could collect your feedback and share an updated
> > patch that considers the warnings.
> >
> > Many thanks
> > Best regards
>
> ---
>  tools/perf/util/arm-spe.c | 66 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 63 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 708323d7c93c..6a2f7a484a80 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -71,6 +71,17 @@ struct arm_spe {
>      u64                kernel_start;
>  
>      unsigned long            num_events;
> +
> +    /*
> +     * Used for PID tracing.
> +     */
> +    u8                exclude_kernel;
> +
> +    /*
> +     * Warning messages.
> +     */
> +    u8                warn_context_pkt_namesapce;
> +    u8                warn_context_switch_ev_accuracy;
>  };
>  
>  struct arm_spe_queue {
> @@ -586,11 +597,42 @@ static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
>      return timeless_decoding;
>  }
>  
> +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) {
> +    struct evsel *evsel;
> +    struct evlist *evlist = spe->session->evlist;
> +
> +    evlist__for_each_entry(evlist, evsel) {
> +    if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel)
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
>  static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
>                      struct auxtrace_queue *queue)
>  {
>      struct arm_spe_queue *speq = queue->priv;
> -    pid_t tid;
> +    pid_t tid = machine__get_current_tid(spe->machine, speq->cpu);
> +    u64 context_id = speq->decoder->record.context_id;
> +
> +    /*
> +    * We're tracing the kernel.
> +    */
> +    if (!spe->exclude_kernel) {

This is incorrect ... 'exclude_kernel' is a global variable and if
it's set then perf will always run below code.

I think here you want to avoid using contect packet for user space
samples, but checking 'exclude_kernel' cannot help for this purpose
since 'exclude_kernel' cannot be used to decide sample mode (kernel
mode or user mode).

Thanks,
Leo

> +        /*
> +         * Use CONTEXT packets in kernel tracing if available and warn the user of the
> +         * values correspond to the root PID namespace.
> +         *
> +         * If CONTEXT packets aren't available but context-switch events are, warn the user
> +         * of the time inaccuracies.
> +         */
> +        if (context_id != (u64) -1) {
> +            tid = speq->decoder->record.context_id;
> +            spe->warn_context_pkt_namesapce = true;
> +        } else if (tid != -1 && context_id == (u64) -1)
> +            spe->warn_context_switch_ev_accuracy = true;
> +    }
>  
>      tid = machine__get_current_tid(spe->machine, speq->cpu);
>      if (tid != -1) {
> @@ -740,7 +782,8 @@ static int arm_spe_process_event(struct perf_session *session,
>          if (err)
>              return err;
>  
> -        if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
> +        if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE ||
> +            event->header.type == PERF_RECORD_SWITCH)
>              err = arm_spe_context_switch(spe, event, sample);
>      }
>  
> @@ -807,7 +850,20 @@ static int arm_spe_flush(struct perf_session *session __maybe_unused,
>          return arm_spe_process_timeless_queues(spe, -1,
>                  MAX_TIMESTAMP - 1);
>  
> -    return arm_spe_process_queues(spe, MAX_TIMESTAMP);
> +    ret = arm_spe_process_queues(spe, MAX_TIMESTAMP);
> +
> +    if (spe->warn_context_pkt_namesapce)
> +        ui__warning(
> +            "Arm SPE CONTEXT packets used for PID/TID tracing.\n\n"
> +            "PID values correspond to the root PID namespace.\n\n");
> +
> +    if (spe->warn_context_switch_ev_accuracy)
> +        ui__warning(
> +            "No Arm SPE CONTEXT packets found within traces.\n\n"
> +            "Fallback to PERF_RECORD_SWITCH events for PID/TID tracing will have\n"
> +            "workload-dependant timing inaccuracies.\n\n");
> +
> +    return ret;
>  }
>  
>  static void arm_spe_free_queue(void *priv)
> @@ -1083,6 +1139,10 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>  
>      spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
>  
> +    spe->exclude_kernel = arm_spe__is_exclude_kernel(spe);
> +    spe->warn_context_pkt_namesapce = false;
> +    spe->warn_context_switch_ev_accuracy = false;
> +
>      /*
>       * The synthesized event PERF_RECORD_TIME_CONV has been handled ahead
>       * and the parameters for hardware clock are stored in the session
> --
> 2.17.1