Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events

From: Stephane Eranian
Date: Fri Oct 01 2021 - 14:22:48 EST


On Fri, Oct 1, 2021 at 3:44 AM James Clark <james.clark@xxxxxxx> wrote:
>
>
>
> On 30/09/2021 19:47, Stephane Eranian wrote:
> > On Thu, Sep 23, 2021 at 9:02 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> >>
> >> Hi Leo,
> >>
> >> On Thu, Sep 23, 2021 at 7:23 AM Leo Yan <leo.yan@xxxxxxxxxx> wrote:
> >>>
> >>> Hi Namhyung,
> >>>
> >>> On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote:
> >>>
> >>> [...]
> >>>
> >>>>> Before we had discussion for enabling PID/TID for SPE samples; in the patch
> >>>>> set [1], patches 07, 08 set sample's pid/tid based on the Arm SPE context
> >>>>> packets. To enable hardware tracing context ID, you also needs to enable
> >>>>> kernel config CONFIG_PID_IN_CONTEXTIDR.
> >>>>
> >>>> Thanks for sharing this.
> >>>>
> >>>> Yeah I also look at the context info but having a dependency on a kconfig
> >>>> looks limiting its functionality. Also the kconfig says it has some overhead
> >>>> in the critical path (even if perf is not running, right?) - but not sure how
> >>>> much it can add.
> >>>
> >>> Yes, after enabled config PID_IN_CONTEXTIDR, the kernel will always
> >>> write PID into the system register CONTEXTIDR during process context
> >>> switching. Please see the flow:
> >>>
> >>> __switch_to() (arch/arm64/kernel/process.c)
> >>> `-> contextidr_thread_switch(next)
> >>
> >> Thanks for the info. I assume it's a light-weight operation.
> >>
> >>
> > I'd like to understand why it was believed that having SPE record to
> > PID could be too expensive
> > vs. what I am seeing with all the tracking of context switches and the
> > volume of data this generates.
> >
>
> I think the justification about it being expensive is that when PID_IN_CONTEXTIDR
> is set, there is an extra few instructions to write that value on every context
> switch, whether SPE is enabled or not. So it has a system wide impact.

You could use a static key to make this conditional to having SPE
running on the CPU like
it is done for other PMU features on other architectures.