On Tue, Mar 03, 2020 at 08:40:10PM -0500, Liang, Kan wrote:
I'm thinking this is wrong.
That is, yes, this fixes the observed problem, but it also misses at
least one other site. Which seems to suggest we ought to take a
different approach.
But even with that; I wonder if the actual condition isn't wrong.
Suppose the event was exclusive, and other events weren't scheduled
because of that. Then you disable the one exclusive event _and_ kill
rotation, so then nothing else will ever get on.
So what I think was supposed to happen is rotation killing itself;
rotation will schedule out the context -- which will clear the flag, and
then schedule the thing back in -- which will set the flag again when
needed.
Now, that isn't happening, and I think I see why, because when we drop
to !nr_active, we terminate ctx_sched_out() before we get to clearing
the flag, oops!
So how about something like this?
---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e453589da97c..7947bd3271a9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2182,6 +2182,7 @@ __perf_remove_from_context(struct perf_event *event,
if (!ctx->nr_events && ctx->is_active) {
ctx->is_active = 0;
+ ctx->rotate_necessary = 0;
if (ctx->task) {
WARN_ON_ONCE(cpuctx->task_ctx != ctx);
cpuctx->task_ctx = NULL;
The patch can fix the observed problem with uncore PMU.
But it cannot fix all the cases with core PMU, especially when NMI watchdog
is enabled.
Because the ctx->nr_events never be 0 with NMI watchdog enabled.
But, I'm confused.. why do we care about nr_events==0 ? The below: vvvv
@@ -3074,15 +3075,15 @@ static void ctx_sched_out(struct perf_event_context *ctx,
is_active ^= ctx->is_active; /* changed bits */
- if (!ctx->nr_active || !(is_active & EVENT_ALL))
- return;
-
/*
* If we had been multiplexing, no rotations are necessary, now no events
* are active.
*/
ctx->rotate_necessary = 0;
+ if (!ctx->nr_active || !(is_active & EVENT_ALL))
+ return;
+
perf_pmu_disable(ctx->pmu);
if (is_active & EVENT_PINNED) {
list_for_each_entry_safe(event, tmp, &ctx->pinned_active, active_list)
Makes sure we clear the flag when we ctx_sched_out(), and as long as
ctx->rotate_necessary is set, perf_rotate_context() will do exactly
that.
Then ctx_sched_in() will re-set the flag if it failed to schedule a
counter.
So where is that going wrong?