Re: [rfc] balance-on-fork NUMA placement

From: Martin Bligh
Date: Wed Aug 01 2007 - 13:53:52 EST


Nick Piggin wrote:
On Tue, Jul 31, 2007 at 11:14:08AM +0200, Andi Kleen wrote:
On Tuesday 31 July 2007 07:41, Nick Piggin wrote:

I haven't given this idea testing yet, but I just wanted to get some
opinions on it first. NUMA placement still isn't ideal (eg. tasks with
a memory policy will not do any placement, and process migrations of
course will leave the memory behind...), but it does give a bit more
chance for the memory controllers and interconnects to get evenly
loaded.
I didn't think slab honored mempolicies by default? At least you seem to need to set special process flags.

NUMA balance-on-fork code is in a good position to allocate all of a new
process's memory on a chosen node. However, it really only starts
allocating on the correct node after the process starts running.

task and thread structures, stack, mm_struct, vmas, page tables etc. are
all allocated on the parent's node.
The page tables should be only allocated when the process runs; except
for the PGD.

We certainly used to copy all page tables on fork. Not any more, but we
must still copy anonymous page tables.

This topic seems to come up periodically every since we first introduced
the NUMA scheduler, and every time we decide it's a bad idea. What's
changed? What workloads does this improve (aside from some artificial
benchmark like stream)?

To repeat the conclusions of last time ... the primary problem is that
99% of the time, we exec after we fork, and it makes that fork/exec
cycle slower, not faster, so exec is generally a much better time to do
this. There's no good predictor of whether we'll exec after fork, unless
one has magically appeared since late 2.5.x ?

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/