Re: [PATCH v7 09/11] sched: record per-cgroup number of contextswitches

From: Tejun Heo
Date: Thu Jun 06 2013 - 20:05:01 EST


Hello,

Maybe we should break off addition of switch stats to a separate set?
They are two separate things.

On Wed, May 29, 2013 at 03:03:20PM +0400, Glauber Costa wrote:
> @@ -3642,6 +3642,8 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev)
> prev->sched_class->put_prev_task(rq, prev);
>
> do {
> + if (likely(prev))
> + cfs_rq->nr_switches++;
> se = pick_next_entity(cfs_rq);
> set_next_entity(cfs_rq, se);
> cfs_rq = group_cfs_rq(se);
> @@ -3651,6 +3653,22 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev)
> if (hrtick_enabled(rq))
> hrtick_start_fair(rq, p);
>
> + /*
> + * This condition is extremely unlikely, and most of the time will just
> + * consist of this unlikely branch, which is extremely cheap. But we
> + * still need to have it, because when we first loop through cfs_rq's,
> + * we can't possibly know which task we will pick. The call to
> + * set_next_entity above is not meant to mess up the tree in this case,
> + * so this should give us the same chain, in the same order.
> + */
> + if (unlikely(p == prev)) {
> + se = &p->se;
> + for_each_sched_entity(se) {
> + cfs_rq = cfs_rq_of(se);
> + cfs_rq->nr_switches--;
> + }
> + }
> +

This concern may be fringe but the above breaks the monotonically
increasing property of the stat. Depending on the timing, a very
unlucky consumer of the stat may see the counter going backward which
can lead to nasty things. I'm not sure whether the fact that it'd be
very difficult to trigger is a pro or con.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/