Re: [PATCH 00/46] Automatic NUMA Balancing V4

From: Ingo Molnar
Date: Wed Nov 21 2012 - 13:21:54 EST



* Mel Gorman <mgorman@xxxxxxx> wrote:

> On Wed, Nov 21, 2012 at 06:33:16PM +0100, Ingo Molnar wrote:
> >
> > * Mel Gorman <mgorman@xxxxxxx> wrote:
> >
> > > On Wed, Nov 21, 2012 at 06:03:06PM +0100, Ingo Molnar wrote:
> > > >
> > > > * Mel Gorman <mgorman@xxxxxxx> wrote:
> > > >
> > > > > On Wed, Nov 21, 2012 at 10:21:06AM +0000, Mel Gorman wrote:
> > > > > >
> > > > > > I am not including a benchmark report in this but will be posting one
> > > > > > shortly in the "Latest numa/core release, v16" thread along with the latest
> > > > > > schednuma figures I have available.
> > > > > >
> > > > >
> > > > > Report is linked here https://lkml.org/lkml/2012/11/21/202
> > > > >
> > > > > I ended up cancelling the remaining tests and restarted with
> > > > >
> > > > > 1. schednuma + patches posted since so that works out as
> > > >
> > > > Mel, I'd like to ask you to refer to our tree as numa/core or
> > > > 'numacore' in the future. Would such a courtesy to use the
> > > > current name of our tree be possible?
> > > >
> > >
> > > Sure, no problem.
> >
> > Thanks!
> >
> > I ran a quick test with your 'balancenuma v4' tree and while
> > numa02 and numa01-THREAD-ALLOC performance is looking good,
> > numa01 performance does not look very good:
> >
> > mainline numa/core balancenuma-v4
> > numa01: 340.3 139.4 276 secs
> >
> > 97% slower than numa/core.
> >
>
> It would be. numa01 is an adverse workload where all threads
> are hammering the same memory. The two-stage filter in
> balancenuma restricts the amount of migration it does so it
> ends up in a situation where it cannot balance properly. [...]

Do you mean this "balancenuma v4" patch attributed to you:

Subject: mm: Numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships
From: Mel Gorman <mgorman@xxxxxxx>
Date: Wed, 21 Nov 2012 10:21:42 +0000

...

Signed-off-by: Mel Gorman <mgorman@xxxxxxx>

which has:

/*
* Multi-stage node selection is used in conjunction
* with a periodic migration fault to build a temporal
* task<->page relation. By using a two-stage filter we
* remove short/unlikely relations.
*
* Using P(p) ~ n_p / n_t as per frequentist
* probability, we can equate a task's usage of a
* particular page (n_p) per total usage of this
* page (n_t) (in a given time-span) to a probability.
*
* Our periodic faults will sample this probability and
* getting the same result twice in a row, given these
* samples are fully independent, is then given by
* P(n)^2, provided our sample period is sufficiently
* short compared to the usage pattern.
*
* This quadric squishes small probabilities, making
* it less likely we act on an unlikely task<->page
* relation.

This looks very similar to the code and text that Peter wrote
for numa/core:

/*
* Multi-stage node selection is used in conjunction with a periodic
* migration fault to build a temporal task<->page relation. By
* using a two-stage filter we remove short/unlikely relations.
*
* Using P(p) ~ n_p / n_t as per frequentist probability, we can
* equate a task's usage of a particular page (n_p) per total usage
* of this page (n_t) (in a given time-span) to a probability.
*
* Our periodic faults will then sample this probability and getting
* the same result twice in a row, given these samples are fully
* independent, is then given by P(n)^2, provided our sample period
* is sufficiently short compared to the usage pattern.
*
* This quadric squishes small probabilities, making it less likely
* we act on an unlikely task<->page relation.
*
* Return the best node ID this page should be on, or -1 if it should
* stay where it is.
*/

see commit:

30f93abc6cb3 sched, numa, mm: Add the scanning page fault machinery

?

I think it's the very same concept - yours is taken from an
older sched/numa commit and attributed to yourself? [If so then
please fix the attribution.]

We have the same filter in numa/core - because we wrote it (FYI,
I wrote bits of the last_cpu variant in numa/core), yet our
numa01 performance is much better than the one of balancenuma.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/