Re: [PATCH] perf_events: improve Intel event scheduling

From: Paul Mackerras
Date: Mon Dec 21 2009 - 20:10:36 EST


On Fri, Dec 11, 2009 at 12:59:16PM +0100, stephane eranian wrote:

> There is a major difference between PPC and X86 here. PPC has a centralized
> register to control start/stop. This register uses bitmask to enable
> or disable counters. Thus, in hw_perf_enable(), if n_added=0, then you
> just need to
> use the pre-computed bitmask. Otherwise, you need to recompute the bitmask to
> include the new registers. The assignment of events and validation is done in
> hw_group_sched_in().

That's not entirely accurate. Yes there is a global start/stop bit,
but there isn't a bitmask to enable or disable counters. There is a
selector bitfield for each counter (except the limited-function
counters) and you can set the selector to the 'count nothing' value if
you don't want a particular counter to count.

Validation is done in hw_group_sched_in() but not the assignment of
events to counters. That's done in hw_perf_enable(), via the
model-specific ppmu->compute_mmcr() call.

> In X86, assignment and validation is done in hw_group_sched_in(). Activation is
> done individually for each counter. There is no centralized register
> used here, thus
> no bitmask to update.
>
> Disabling a counter does not trigger a complete reschedule of events.
> This happens
> only when hw_group_sched_in() is called.
>
> The n_events = 0 in hw_perf_disable() is used to signal that something
> is changing.
> It should not be here but here.

The meaning of "It should not be here but here" is quite unclear to me.

> The problem is that
> hw_group_sched_in() needs a way
> to know that it is called for a completely new series of group
> scheduling so it can
> discard any previous assignment. This goes back to the issue I raised
> in my previous
> email. You could add a parameter to hw_group_sched_in() that would
> indicate this is
> the first group. that would cause n_events =0 and the function would
> start accumulating
> events for the new scheduling period.

I don't think hw_group_sched_in is ever called for a completely new
series of group scheduling. If you have per-cpu counters active, they
don't get scheduled out and in again with each task switch. So you
will tend to get a hw_pmu_disable call, then a series of disable calls
for the per-task events for the old task, then a series of
hw_group_sched_in calls for the per-task events for the new task, then
a hw_pmu_enable call.

On powerpc we maintain an array with pointers to all the currently
active events. That makes it easy to know at hw_pmu_enable() time
what events need to be put on the PMU. Also it means that at
hw_group_sched_in time you can look at the whole set of events,
including the ones just added, to see if it's feasible to put them all
on. At that point we just check feasibility, which is quite quick and
easy using the bit-vector encoding of constraints. The bit-vector
encoding lets us represent multiple constraints of various forms in
one pair of 64-bit values per event. We can express constraints such
as "you can have at most N events in a class X" or "you can't have
events in all of classes A, B, C and D" or "control register bitfield
X must be set to Y", and then check that a set of events satisfies all
the constraints with some simple integer arithmetic. I don't know
exactly what constraints you have on x86 but I would be surprised if
you couldn't handle them the same way.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/