Re: [RFC][PATCH 4/4] sched, numa: Ignore pinned tasks

From: Peter Zijlstra
Date: Sat May 16 2015 - 05:32:23 EST


On Fri, May 15, 2015 at 05:43:37PM +0200, Peter Zijlstra wrote:
> static void account_numa_enqueue(struct rq *rq, struct task_struct *p)
> {
> + if (p->nr_cpus_allowed == 1) {
> + p->numa_preferred_nid = -1;
> + rq->nr_pinned_running++;
> + }
> rq->nr_numa_running += (p->numa_preferred_nid != -1);
> rq->nr_preferred_running += (p->numa_preferred_nid == task_node(p));
> }

> static inline enum fbq_type fbq_classify_rq(struct rq *rq)
> {
> + unsigned int nr_migratable = rq->cfs.h_nr_running - rq->nr_pinned_running;
> +

FWIW, there's a problem there with CFS bandwidth muck. When we throttle
groups we update cfs.h_nr_running properly, but we do not hierarchically
account the pinned, preferred and numa counts.

So that above subtraction can end up negative.

I've not yet decided what to do about this; ideally we'd do the
hierarchical accounting of the numa stats -- but that's a little bit
more expensive than I'd like.

A well. for monday that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/