Re: [GIT PULL tip:x86/mm]

From: David Rientjes
Date: Tue Mar 01 2011 - 12:19:16 EST


On Thu, 24 Feb 2011, Yinghai Lu wrote:

> DavidR reported that x86/mm broke his numa emulation with 128M etc.
>
> So wonder if that would hold you to push whole tip/x86/mm to Linus for .39
> or need to rebase it while taking the tip/x86/numa-emulation-unify out.
>

Ok, so 1f565a896ee1 (x86-64, NUMA: Fix size of numa_distance array) fixes
the boot failure when using numa=fake, but there's still another issue
that was introduced with regard to emulated distances between fake nodes
sitting hardware using a SLIT.

This is important because we want to ensure that the physical topoloy of
the machine is still represented in an emulated environment to
appropriately describe the expected latencies between the nodes. It also
allows users who are using numa=fake purely as a debugging tool to test
more interesting configurations and benchmark memory accesses between
emulated nodes as though they were real.

For example, on my four-node system with a custom SLIT, this is the
distance when booting without numa=fake:

$ cat /sys/devices/system/node/node*/distance
10 20 20 30
20 10 20 20
20 20 10 20
30 20 20 10

These physical nodes are all symmetric in size.

With numa=fake=16, we expect to see the fake nodes interleaved (as the
default) over the set of physical nodes. This would suggest distance
files for these nodes to be:

10 20 20 30 10 20 20 30 10 20 20 30 10 20 20 30
20 20 10 20 20 20 10 20 20 20 10 20 20 20 10 20
30 20 20 10 30 20 20 10 30 20 20 10 30 20 20 10
10 20 20 30 10 20 20 30 10 20 20 30 10 20 20 30
20 10 20 20 20 10 20 20 20 10 20 20 20 10 20 20
20 20 10 20 20 20 10 20 20 20 10 20 20 20 10 20
30 20 20 10 30 20 20 10 30 20 20 10 30 20 20 10
20 10 20 20 20 10 20 20 20 10 20 20 20 10 20 20
20 20 10 20 20 20 10 20 20 20 10 20 20 20 10 20
30 20 20 10 30 20 20 10 30 20 20 10 30 20 20 10
10 20 20 30 10 20 20 30 10 20 20 30 10 20 20 30
20 10 20 20 20 10 20 20 20 10 20 20 20 10 20 20
20 20 10 20 20 20 10 20 20 20 10 20 20 20 10 20
30 20 20 10 30 20 20 10 30 20 20 10 30 20 20 10
10 20 20 30 10 20 20 30 10 20 20 30 10 20 20 30
20 10 20 20 20 10 20 20 20 10 20 20 20 10 20 20

(And that is what we see with 2.6.37.)

However, x86/mm describes these distances differently:

node0/distance:10 20 20 20 10 20 20 20 10 20 20 20 10 20 20 20
node1/distance:10 10 20 20 10 20 20 20 10 20 20 20 10 20 20 20
node2/distance:10 20 10 20 10 20 20 20 10 20 20 20 10 20 20 20
node3/distance:10 20 20 10 10 20 20 20 10 20 20 20 10 20 20 20
node4/distance:10 20 20 20 10 20 20 20 10 20 20 20 10 20 20 20
node5/distance:10 20 20 20 10 10 20 20 10 20 20 20 10 20 20 20
node6/distance:10 20 20 20 10 20 10 20 10 20 20 20 10 20 20 20
node7/distance:10 20 20 20 10 20 20 10 10 20 20 20 10 20 20 20
node8/distance:10 20 20 20 10 20 20 20 10 20 20 20 10 20 20 20
node9/distance:10 20 20 20 10 20 20 20 10 10 20 20 10 20 20 20
node10/distance:10 20 20 20 10 20 20 20 10 20 10 20 10 20 20 20
node11/distance:10 20 20 20 10 20 20 20 10 20 20 10 10 20 20 20
node12/distance:10 20 20 20 10 20 20 20 10 20 20 20 10 20 20 20
node13/distance:10 20 20 20 10 20 20 20 10 20 20 20 10 10 20 20
node14/distance:10 20 20 20 10 20 20 20 10 20 20 20 10 20 10 20
node15/distance:10 20 20 20 10 20 20 20 10 20 20 20 10 20 20 10

It looks as though the emulation changes sitting in x86/mm have dropped
the SLIT and are merely describing the emulated nodes as either having
physical affinity or not.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/