Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2

From: David Rientjes
Date: Tue Sep 22 2009 - 03:59:35 EST

Next message: Kirill A. Shutemov: "[PATCH v4, REBASED 1/2] ARM: Pass IFSR register to do_PrefetchAbort()"
Previous message: Zdenek Kabelac: "Re: [PATCH] Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs"
In reply to: Christoph Lameter: "Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2"
Next in thread: Benjamin Herrenschmidt: "Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 22 Sep 2009, Christoph Lameter wrote:

> How would you deal with a memoryless node that has lets say 4 processors
> and some I/O devices? Now the memory policy is round robin and there are 4
> nodes at the same distance with 4G memory each. Does one of the nodes now
> become priviledged under your plan? How do you equally use memory from all
> these nodes?
>

If the distance between the memoryless node with the cpus/devices and all
4G nodes is the same, then this is UMA and no abstraction is necessary:
there's no reason to support interleaving of memory allocations amongst
four different regions of memory if there's no difference in latencies to
those regions.

It is possible, however, to have a system configured in such a way that
representing all devices, including memory, at a single level of
abstraction isn't possible. An example is a four cpu system where cpus
0-1 have local distance to all memory and cpus 2-3 have remote distance.

A solution would be to abstract everything into "system localities" like
the ACPI specification does. These localities in my plan are slightly
different, though: they are limited to only a single class of device.

A locality is simply an aggregate of a particular type of device; a device
is bound to a locality if it shares the same proximity as all other
devices in that locality to all other localities. In other words, the
previous example would have two cpu localities: one with cpus 0-1 and one
with cpus 2-3. If cpu 0 had a different proximity than cpu 1 to a pci
bus, however, there would be three cpu localities.

The equivalent of proximity domains then describes the distance between
all localities; these distances need not be one-way, it is possible for
distance in one direction to be different from the opposite direction,
just as ACPI pxm's allow.

A "node" in this plan is simply a system locality consisting of memory.

For subsystems such as slab allocators, all we require is cpu_to_node()
tables which would map cpu localities to nodes and describe them in terms
of local or remote distance (or whatever the SLIT says, if provided). All
present day information can still be represented in this model, we've just
added additional layers of abstraction internally.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Kirill A. Shutemov: "[PATCH v4, REBASED 1/2] ARM: Pass IFSR register to do_PrefetchAbort()"
Previous message: Zdenek Kabelac: "Re: [PATCH] Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs"
In reply to: Christoph Lameter: "Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2"
Next in thread: Benjamin Herrenschmidt: "Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]