Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?

From: Brett E.
Date: Fri May 21 2004 - 13:15:09 EST

Next message: Christian Kujau: "kernel BUG at fs/buffer.c:1270! [TAINTED]"
Previous message: Sau Dan Lee: "Re: swsusp: fix devfs breakage introduced in 2.6.6"
In reply to: Martin J. Bligh: "Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?"
Next in thread: Martin J. Bligh: "Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Martin J. Bligh wrote:

Say you have a bunch of single-threaded processes on a NUMA machine. Does the kernel make sure to prefer allocations using a certain CPU's memory, preferring to run a given process on the CPU which contains its memory? Or should I use the NUMA API(libnuma) to spell this out to the kernel? Does the kernel do the right thing in this case?

The kernel will generally do the right thing (process local alloc) by
default. In 99% of cases, you don't want to muck with it - unless you're
running one single app dominating the whole system, and nothing else is
going on, you probably don't want to specify anything explicitly.

M.

Let's say I have a 2 way opteron and want to run 4 long-lived processes. I fork and exec to create 1 of the processes, it chooses to run on processor 0 since processor 1 is overloaded at that time, so its homenode is processor 0.

There is no such thing as a homenode. What you describe is more or less why
we ditched that concept.

I fork and exec another, it chooses processor 0 since processors 1 is overloaded at that time. .. Let's say an uneven distribution is chosen for all 4 processes, with all processes mapped to processor 0. So they allocate on node 0 yet the scheduler will map these to both processors since CPU should be balanced. In this case, you will have a situation where the second processor will have to fetch memory from the other processor's memory.

Each process will allocate local to the CPU it is running on when it does the
allocate, so it's difficult to get as off-kilter as you describe (though it
is still possible).

So could process 0 run on processor 0, allocating local to processor 0, then run on processor 1, allocating local to processor 1, this way allocating to both processors? So over time process 0's allocations would be split up between both processors, defeating NUMA. The homenode concept + explicit CPU pinning seems useful in that they allow you to take advantage of NUMA better. Without these two things the kernel will just allocate on the currently running CPU whatever that may be when in fact a preference must be given to a CPU at some point, hopefully early on in the life of the process, in order to take advantage of NUMA.

I'm trying to play devil's advocate by the way so bear with me, you've been very helpful and I'm learning a great deal from you. :)

Thanks,

Brett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Christian Kujau: "kernel BUG at fs/buffer.c:1270! [TAINTED]"
Previous message: Sau Dan Lee: "Re: swsusp: fix devfs breakage introduced in 2.6.6"
In reply to: Martin J. Bligh: "Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?"
Next in thread: Martin J. Bligh: "Re: How can I optimize a process on a NUMA architecture(x86-64 specifically)?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]