Re: [Lse-tech] Node affine NUMA scheduler extension

From: Andi Kleen (ak@suse.de)
Date: Tue May 27 2003 - 05:05:02 EST


On Tue, May 27, 2003 at 11:54:52AM +0200, Erich Focht wrote:
> > But the main problems I have is that the tuning for threads is very
> > difficult. On AMD64 where Node equals CPU it is important
> > to home node balance threads too. After some experiments I settled on
> > homenode assignment on the first load balance (called "lazy homenode")
> > When a thread clones it initially executes on the CPU of the parent, but
> > there is a window until the first load balance tick where it can allocate
> > memory on the wrong node. I found a lot of code runs very badly until the
> > cache decay parameter is set to 0 (no special cache affinity) to allow
> > quick initial migration.
>
> Interesting observation, I didn't make it when I tried the lazy
> homenode (quite a while ago). But I was focusing on MPI jobs. So what
> if we add a condition to CAN_MIGRATE which disables the cache affinity
> before the first load balance?

What I currently have is two cache decay variables: one is used if the
homenode is not assigned, the other otherwise. Both are sysctls too.
But it obviously only works with lazy homenode, but the state is the same.
I'm still not completely happy with it though.

Why exactly did you gave up to use the lazy homenode?

>
> > Migration directly on fork/clone requires a lot
> > of changes and also breaks down on some benchmarks.
>
> Hmmm, I wouldn't allow this to any task/child, only to special
> ones. Under 2.4 I currently use a sched_balance_fork() function

Yes, I agree.

> similar to sched_balance_exec(). Tasks have a default initial load
> balancing policy of being migrated (and selecting the homenode) at
> exec(). This can be changed (with prctl) to fork(). The ilb policy is
> inheritable. Works fine for OpenMP jobs.

Hmm, I should try that I guess. Where do you call it? At the end of do_fork?
I tried to hack up wake_up_forked_process() to do it, but it required
large scale changes all over the scheduler so I eventually gave up.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/