[PATCH] node affine NUMA scheduler 1/5

From: Erich Focht (efocht@ess.nec.de)
Date: Mon Oct 14 2002 - 06:06:24 EST


I resend these because for some unknown reason they don't seem to
have made it neither into the MARC archives nor into those at
www.cs.helsinki.fi

---------- Resent Message ----------

Subject: [PATCH] node affine NUMA scheduler 1/5
Date: Fri, 11 Oct 2002 19:54:30 +0200

Hi,

here comes the complete set of patches for the node affine NUMA
scheduler. It's made of several building blocks and one can make
several flavors of NUMA schedulers out of the patches.

The patches are:

01-numa_sched_core-2.5.39-10.patch :
       Provides basic NUMA functionality. It implements CPU pools
       with all the mess needed to initialize them. Also it has a
       node aware find_busiest_queue() which first scans the own
       node for more loaded CPUs. If no steal candidate is found on
       the own node, it finds the most loaded node and tries to steal
       a task from it. By steal delays for remote node steals it
       tries to achieve equal node load. These delays can be extended
       to cope with multi-level node hierarchies (that patch is not
       included).

02-numa_sched_ilb-2.5.39-10.patch :
       This patch provides simple initial load balancing during exec().
       It is node aware and will select the least loaded node. Also it
       does a round-robin initial node selection to distribute the load
       better across the nodes.

03-node_affine-2.5.39-10.patch :
       This is the heart of the node affine part of the patch. Tasks
       are assigned a homenode during initial load balancing and they
       are attracted to the homenode.

04-alloc_on_homenode.patch :
       Coupling with the memory allocator: for user tasks allocate memory
       from the homenode, no matter on which node the task is scheduled.

05-dynamic_homenode-2.5.39-10.patch :
       Dynamic homenode selection. When pages are allocated or freed
       they are tracked. The homenode is recalculated dynamically and
       set to the node where most of the memory of the task is allocated.

Meaningfull combinations of patches are:

A : numa scheduler : 01 + 02 node aware NUMA scheduler, with initial load
                 balancing
B : node affine scheduler : 01 + 02 + 03 (+04)

C : node affine scheduler with dynamic homenode selection :
      01 + 02 + 03 + 05 ( !exclude 04 !)

The best results should be provided by C as it incorporates most of
the features.

The patches should run on ia32 NUMAQ and ia64 Azusa (with the topology
patches applied). Other architectures just need the build_node() call
similar to arch/i386/kernel/smpboot.c The issues with NUMAQ (uninitialized
platform specific stuff) should be solved.

Comments, flames, etc... welcome ;-)

Best regards,
Erich



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 15 2002 - 22:00:49 EST