Re: round-robining per-cpu counters

From: Ingo Molnar
Date: Tue May 05 2009 - 02:41:36 EST



* Paul Mackerras <paulus@xxxxxxxxx> wrote:

> It used to be, and as far as I can see still is, the case that
> per-cpu counters take priority over per-task counters by virtue of
> being scheduled in first. That is, if you have N hardware
> counters and >= N per-cpu counters, then no per-task counters will
> ever get scheduled onto the PMU.
>
> That being the case, I don't see what the point of having the
> perf_reserved_percpu variable is. It doesn't do anything except
> set cpuctx->max_pertask, which isn't actually used anywhere. In
> any case with the current counter scheduling system there's no
> need to "reserve" hardware counters for use by per-cpu counters
> since any new per-cpu counters will just bump existing per-task
> counters off - if not immediately then the next time that
> perf_counter_task_tick gets called.
>
> What was the intended meaning of perf_reserved_percpu? I presume
> it was that there would always be that many hardware counters
> available for per-cpu counters regardless of how many per-task
> counters there are. But that doesn't answer the complementary
> question - how many hardware counters can we rely on being
> available for per-task counters? At the moment the answer is 0,
> but I don't think that is a good answer.
>
> Does anyone have any good ideas about what the scheduling policy
> should be?

The reservation mechanism really suffered from not being used by
anything or anyone, and it thus bit-rotted across 300 follow-on
commits.

What would be the primary usecase? Allow admin to set aside (and
guarantee) space for task counters? Allow admin to 'force'
exclusivity of counter ownership?

I think a better general solution would be to have a single
round-robin list for all currently active counters (both percpu and
task counters) - and fairly round-robin all of them. The scaling
information makes it obvious when this is happening.

If admin wants stronger ownership of counters then the
pinned/exclusive attribute can be used.

We really want to keep the counter-scheduler simple, and we also
want to make the default to be as permissive as possible.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/