Re: [RFC PATCH v2 0/3] mm: mempolicy: Multi-tier weighted interleaving

From: Huang, Ying
Date: Wed Oct 18 2023 - 04:31:12 EST


Gregory Price <gregory.price@xxxxxxxxxxxx> writes:

> On Mon, Oct 16, 2023 at 03:57:52PM +0800, Huang, Ying wrote:
>> Gregory Price <gourry.memverge@xxxxxxxxx> writes:
>>
>> > == Mutex to Semaphore change:
>> >
>> > Since it is expected that many threads will be accessing this data
>> > during allocations, a mutex is not appropriate.
>>
>> IIUC, this is a change for performance. If so, please show some
>> performance data.
>>
>
> This change will be dropped in v3 in favor of the existing
> RCU mechanism in memory-tiers.c as pointed out by Matthew.
>
>> > == Source-node relative weighting:
>> >
>> > 1. Set weights for DDR (tier4) and CXL(teir22) tiers.
>> > echo source_node:weight > /path/to/interleave_weight
>>
>> If source_node is considered, why not consider target_node too? On a
>> system with only 1 tier (DRAM), do you want weighted interleaving among
>> NUMA nodes? If so, why tie weighted interleaving with memory tiers?
>> Why not just introduce weighted interleaving for NUMA nodes?
>>
>
> The short answer: Practicality and ease-of-use.
>
> The long answer: We have been discussing how to make this more flexible..
>
> Personally, I agree with you. If Task A is on Socket 0, the weight on
> Socket 0 DRAM should not be the same as the weight on Socket 1 DRAM.
> However, right now, DRAM nodes are lumped into the same tier together,
> resulting in them having the same weight.
>
> If you scrollback through the list, you'll find an RFC I posted for
> set_mempolicy2 which implements weighted interleave in mm/mempolicy.
> However, mm/mempolicy is extremely `current-centric` at the moment,
> so that makes changing weights at runtime (in response to a hotplug
> event, for example) very difficult.
>
> I still think there is room to extend set_mempolicy to allow
> task-defined weights to take preference over tier defined weights.
>
> We have discussed adding the following features to memory-tiers:
>
> 1) breaking up tiers to allow 1 tier per node, as opposed to defaulting
> to lumping all nodes of a simlar quality into the same tier
>
> 2) enabling movemnet of nodes between tiers (for the purpose of
> reconfiguring due to hotplug and other situations)
>
> For users that require fine-grained control over each individual node,
> this would allow for weights to be applied per-node, because a
> node=tier. For the majority of use cases, it would allow clumping of
> nodes into tiers based on physical topology and performance class, and
> then allow for the general weighting to apply. This seems like the most
> obvious use-case that a majority of users would use, and also the
> easiest to set-up in the short term.
>
> That said, there are probably 3 or 4 different ways/places to implement
> this feature. The question is what is the clear and obvious way?
> I don't have a definitive answer for that, hence the RFC.
>
> There are at least 5 proposals that i know of at the moment
>
> 1) mempolicy
> 2) memory-tiers
> 3) memory-block interleaving? (weighting among blocks inside a node)
> Maybe relevant if Dynamic Capacity devices arrive, but it seems
> like the wrong place to do this.
> 4) multi-device nodes (e.g. cxl create-region ... mem0 mem1...)
> 5) "just do it in hardware"

It may be easier to start with the use case. What is the practical use
cases in your mind that can not be satisfied with simple per-memory-tier
weight? Can you compare the memory layout with different proposals?

>> > # Set tier4 weight from node 0 to 85
>> > echo 0:85 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight
>> > # Set tier4 weight from node 1 to 65
>> > echo 1:65 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight
>> > # Set tier22 weight from node 0 to 15
>> > echo 0:15 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight
>> > # Set tier22 weight from node 1 to 10
>> > echo 1:10 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight

--
Best Regards,
Huang, Ying