Re: [Lse-tech] Node affine NUMA scheduler extension

From: Erich Focht (efocht@hpce.nec.com)
Date: Tue May 27 2003 - 04:55:43 EST


On Tuesday 27 May 2003 11:11, Andi Kleen wrote:
> On Tue, May 27, 2003 at 10:31:55AM +0200, Erich Focht wrote:
> > This patch is an adaptation of the earlier work on the node affine
> > NUMA scheduler to the NUMA features meanwhile integrated into
> > 2.5. Compared to the patch posted for 2.5.39 this one is much simpler
> > and easier to understand.
>
> Yes, it is also much simpler than my implementation (I did a similar
> homenode scheduler for an 2.4 kernel). The basic principles are the same.

:-)

> But the main problems I have is that the tuning for threads is very
> difficult. On AMD64 where Node equals CPU it is important
> to home node balance threads too. After some experiments I settled on
> homenode assignment on the first load balance (called "lazy homenode")
> When a thread clones it initially executes on the CPU of the parent, but
> there is a window until the first load balance tick where it can allocate
> memory on the wrong node. I found a lot of code runs very badly until the
> cache decay parameter is set to 0 (no special cache affinity) to allow
> quick initial migration.

Interesting observation, I didn't make it when I tried the lazy
homenode (quite a while ago). But I was focusing on MPI jobs. So what
if we add a condition to CAN_MIGRATE which disables the cache affinity
before the first load balance?

> Migration directly on fork/clone requires a lot
> of changes and also breaks down on some benchmarks.

Hmmm, I wouldn't allow this to any task/child, only to special
ones. Under 2.4 I currently use a sched_balance_fork() function
similar to sched_balance_exec(). Tasks have a default initial load
balancing policy of being migrated (and selecting the homenode) at
exec(). This can be changed (with prctl) to fork(). The ilb policy is
inheritable. Works fine for OpenMP jobs.

Regards,
Erich


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/