[PATCH 00/35] AutoNUMA alpha14

From: Andrea Arcangeli
Date: Fri May 25 2012 - 13:06:25 EST


Hello everyone,

It's time for a new autonuma-alpha14 milestone.

Removed the [RFC] from Subject because 1) this is a release I'm quite
happy with (from the implementation side it allows the same kernel
image to boot optimally on NUMA and not-NUMA hardware and it avoids
altering the scheduler runtime most of the time) and 2) because of the
great benchmark results we got so far, showing this design so far has
been proved to perform best.

I believe (realistically speaking) nobody is going to change
applications to specify which thread is using which memory (for
threaded apps) with the only exception of QEMU and a few others.

For not threaded apps that fits in a NUMA node, there's no way a blind
home node can perform nearly as good as AutoNUMA: AutoNUMA monitor the
whole status of the memory of the running processes and it optimizes
the memory placement and CPU placement dynamically
accordingly. There's a small memory and CPU cost in collecting so much
information to be able to make smart decisions, but the benefits
largely outweight those costs.

If a big idle task was idle for a long while, but it suddenly start
computing, AutoNUMA may totally change the memory and CPU placement of
the other running tasks according to what's best, because it has
enough information to take optimal NUMA placement decisions.

git clone --reference linux -b autonuma-alpha14 git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git autonuma-alpha14

Development autonuma branch (currently equal to autonuma-alpha14 ==
a49fedcc284a8e8b47175fbc23e9d3b075884e53):

git clone --reference linux -b autonuma git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
to update: git fetch; git checkout -f origin/autonuma

Changelog from alpha13 to alpha14:

o page_autonuma introduction, no memory wasted if the kernel is booted
on not-NUMA hardware. Tested with flatmem/sparsemem on x86
autonuma=y/n and sparsemem/vsparsemem on x86_64 with autonuma=y/n.

The "noautonuma" kernel param disables autonuma permanently also
when booted on NUMA hardware (no /sys/kernel/mm/autonuma, and no
page_autonuma allocations, like cgroup_disable=memory)

o autonuma_balance only runs along with run_rebalance_domains, to
avoid altering the scheduler runtime. autonuma_balance gives a
"kick" to the scheduler only along the load balance events (it
overrides the load balance activity if needed). This change has not
yet been tested on specjbb or more schedule intensive benchmarks,
but I don't expect measurable NUMA affinity regressions. For
intensive compute loads not involving a flood of scheduling activity
this has already been verified not to show any performance
regression, and it will boost the scheduler performance compared to
previous autonuma releases.

Note: autonuma_balance still runs from normal context (not softirq
context like run_rebalance_domains) to be able to wait on process
migration (avoid _nowait), but most of the time it does nothing at
all.

Changelog from alpha11 to alpha13:

o autonuma_balance optimization (take the fast path when process is in
the preferred NUMA node)

TODO:

o THP native migration (orthogonal and also needed for
cpuset/migrate_pages(2)/numa/sched).

o distribute pagecache to other nodes (and maybe shared memory or
other movable memory) if knuma_migrated stops because the local node
is full

Andrea Arcangeli (35):
mm: add unlikely to the mm allocation failure check
autonuma: make set_pmd_at always available
xen: document Xen is using an unused bit for the pagetables
autonuma: define _PAGE_NUMA_PTE and _PAGE_NUMA_PMD
autonuma: x86 pte_numa() and pmd_numa()
autonuma: generic pte_numa() and pmd_numa()
autonuma: teach gup_fast about pte_numa
autonuma: introduce kthread_bind_node()
autonuma: mm_autonuma and sched_autonuma data structures
autonuma: define the autonuma flags
autonuma: core autonuma.h header
autonuma: CPU follow memory algorithm
autonuma: add page structure fields
autonuma: knuma_migrated per NUMA node queues
autonuma: init knuma_migrated queues
autonuma: autonuma_enter/exit
autonuma: call autonuma_setup_new_exec()
autonuma: alloc/free/init sched_autonuma
autonuma: alloc/free/init mm_autonuma
autonuma: avoid CFS select_task_rq_fair to return -1
autonuma: teach CFS about autonuma affinity
autonuma: sched_set_autonuma_need_balance
autonuma: core
autonuma: follow_page check for pte_numa/pmd_numa
autonuma: default mempolicy follow AutoNUMA
autonuma: call autonuma_split_huge_page()
autonuma: make khugepaged pte_numa aware
autonuma: retain page last_nid information in khugepaged
autonuma: numa hinting page faults entry points
autonuma: reset autonuma page data when pages are freed
autonuma: initialize page structure fields
autonuma: link mm/autonuma.o and kernel/sched/numa.o
autonuma: add CONFIG_AUTONUMA and CONFIG_AUTONUMA_DEFAULT_ENABLED
autonuma: boost khugepaged scanning rate
autonuma: page_autonuma

arch/x86/include/asm/paravirt.h | 2 -
arch/x86/include/asm/pgtable.h | 51 ++-
arch/x86/include/asm/pgtable_types.h | 22 +-
arch/x86/mm/gup.c | 2 +-
fs/exec.c | 3 +
include/asm-generic/pgtable.h | 12 +
include/linux/autonuma.h | 53 ++
include/linux/autonuma_flags.h | 68 ++
include/linux/autonuma_sched.h | 50 ++
include/linux/autonuma_types.h | 88 ++
include/linux/huge_mm.h | 2 +
include/linux/kthread.h | 1 +
include/linux/mm_types.h | 5 +
include/linux/mmzone.h | 18 +
include/linux/page_autonuma.h | 53 ++
include/linux/sched.h | 3 +
init/main.c | 2 +
kernel/fork.c | 36 +-
kernel/kthread.c | 23 +
kernel/sched/Makefile | 1 +
kernel/sched/core.c | 12 +-
kernel/sched/fair.c | 72 ++-
kernel/sched/numa.c | 281 +++++++
kernel/sched/sched.h | 10 +
mm/Kconfig | 13 +
mm/Makefile | 1 +
mm/autonuma.c | 1464 ++++++++++++++++++++++++++++++++++
mm/huge_memory.c | 58 ++-
mm/memory.c | 36 +-
mm/mempolicy.c | 15 +-
mm/mmu_context.c | 2 +
mm/page_alloc.c | 6 +-
mm/page_autonuma.c | 234 ++++++
mm/sparse.c | 126 +++-
34 files changed, 2776 insertions(+), 49 deletions(-)
create mode 100644 include/linux/autonuma.h
create mode 100644 include/linux/autonuma_flags.h
create mode 100644 include/linux/autonuma_sched.h
create mode 100644 include/linux/autonuma_types.h
create mode 100644 include/linux/page_autonuma.h
create mode 100644 kernel/sched/numa.c
create mode 100644 mm/autonuma.c
create mode 100644 mm/page_autonuma.c
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/