Re: [RFC][PATCH 00/26] sched/numa

From: Avi Kivity
Date: Mon Mar 19 2012 - 08:16:51 EST


On 03/19/2012 02:09 PM, Peter Zijlstra wrote:
> On Mon, 2012-03-19 at 13:42 +0200, Avi Kivity wrote:
> > > That's intentional, it keeps the work accounted to the tasks that need
> > > it.
> >
> > The accounting part is good, the extra latency is not. If you have
> > spare resources (processors or dma engines) you can employ for eager
> > migration why not make use of them.
>
> Afaik we do not use dma engines for memory migration.

We don't, but I think we should.

> In any case, if you do cross-node migration frequently enough that the
> overhead of copying pages is a significant part of your time then I'm
> guessing there's something wrong.
>
> If not, the latency should be armortised enough to not matter.

Amortization is okay for HPC style applications but not for interactive
applications (including servers). It all depends on the numbers of
course, maybe migrate on fault is okay, we'll need to measure it somehow.

> > > > - doesn't work with dma engines
> > >
> > > How does that work anyway? You'd have to reprogram your dma engine, so
> > > either the ->migratepage() callback does that and we're good either way,
> > > or it simply doesn't work at all.
> >
> > If it's called from the faulting task's context you have to sleep, and
> > the latency gets increased even more, plus you're dependant on the dma
> > engine's backlog. If you do all that from a background thread you don't
> > have to block (you might have to cancel or discard a migration if the
> > page was changed while being copied).
>
> The current MoF implementation simply bails and uses the old page. It
> will never block.

Then it can not use a dma engine.

> Its all a best effort approach, a 'few' stray pages is OK as long as the
> bulk of the pages are local.
>
> If you're concerned, we can add per mm/vma counters to track this.

These are second and third order effects. Overall I'm happy, kvm is one
of the workloads most severely impacted by the current numa support and
this looks like it addresses most of the issues.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/