Re: [PATCH] perf/core: introduce context per CPU event list

From: Mark Rutland
Date: Thu Nov 10 2016 - 07:27:01 EST


On Thu, Nov 10, 2016 at 01:12:53PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 10, 2016 at 12:04:23PM +0000, Mark Rutland wrote:
> > On Thu, Nov 10, 2016 at 12:37:05PM +0100, Peter Zijlstra wrote:
>
> > > So the problem is finding which events are active when.

> > > If we stick all events in an RB-tree sorted on: {pmu,cpu,runtime} we
> > > can, fairly easily, find the relevant subtree and limit the iteration.
> > > Esp. if we use a threaded tree.
> >
> > That would cater for big.LITTLE, certainly, but I'm not sure I follow
> > how that helps to find active events -- you'll still have to iterate
> > through the whole PMU subtree to find which are active, no?
>
> Ah, so the tree would in fact only contain 'INACTIVE' events :-)

Ah. :)

That explains some of the magic, but...

> That is, when no events are on the hardware, all events (if there are
> any) are INACTIVE.
>
> Then on sched-in, we find the relevant subtree, and linearly try and
> program all events from that subtree onto the PMU. Once adding an event
> fails programming, we stop (like we do now).
>
> These programmed events transition from INACTIVE to ACTIVE, and we take
> them out of the tree.
>
> Then on sched-out, we remove all events from the hardware, increase the
> events their runtime value by however long they were ACTIVE, flip them
> to INACTIVE and stuff them back in the tree.

... per the above, won't the tree also contain 'OFF' events (and
'ERROR', etc)?

... or do we keep them somewhere else (another list or sub-tree)?

If not, we still have to walk all of those in perf_iterate_ctx().

> (I'm can't quite recall if we can easily find ACTIVE events from a PMU,
> but if not, we can easily track those on a separate list).

I think we just iterate the perf_event_context::event list and look at
the state. Regardless, adding lists is fairly simple.

Thanks,
Mark.