Re: [RFC PATCH v3 0/4] Node Weights and Weighted Interleave

From: Ravi Jonnalagadda
Date: Thu Nov 02 2023 - 05:36:08 EST


Should Node based interleave solution be considered complex or not would probably
depend on number of numa nodes that would be present in the system and whether
we are able to setup the default weights correctly to obtain optimum bandwidth
expansion.

>
>> Pros and Cons of Memory Tier based interleave:
>> Pros:
>> 1. Programming weight per initiator would apply for all the nodes in the tier.
>> 2. Weights can be calculated considering the cumulative bandwidth of all
>> the nodes in the tier and need to be programmed once for all the nodes in a
>> given tier.
>> 3. It may be useful in cases where numa nodes with similar latency and bandwidth
>> characteristics increase, possibly with pooling use cases.
>
>4. simpler.
>
>> Cons:
>> 1. If nodes with different bandwidth and latency characteristics are placed
>> in same tier as seen in the current mainline kernel, it will be difficult to
>> apply a correct interleave weight policy.
>> 2. There will be a need for functionality to move nodes between different tiers
>> or create new tiers to place such nodes for programming correct interleave weights.
>> We are working on a patch to support it currently.
>
>Thanks! If we have such system, we will need this.
>
>> 3. For systems where each numa node is having different characteristics,
>> a single node might end up existing in different memory tier, which would be
>> equivalent to node based interleaving.
>
>No. A node can only exist in one memory tier.

Sorry for the confusion what i meant was, if each node is having different
characteristics, to program the memory tier weights correctly we need to place
each node in a separate tier of it's own. So each memory tier will contain
only a single node and the solution would resemble node based interleaving.

>
>> On newer systems where all CXL memory from different devices under a
>> port are combined to form single numa node, this scenario might be
>> applicable.
>
>You mean the different memory ranges of a NUMA node may have different
>performance? I don't think that we can deal with this.

Example Configuration: On a server that we are using now, four different
CXL cards are combined to form a single NUMA node and two other cards are
exposed as two individual numa nodes.
So if we have the ability to combine multiple CXL memory ranges to a
single NUMA node the number of NUMA nodes in the system would potentially
decrease even if we can't combine the entire range to form a single node.

>
>> 4. Users may need to keep track of different memory tiers and what nodes are present
>> in each tier for invoking interleave policy.
>
>I don't think this is a con. With node based solution, you need to know
>your system too.
>
>>>
>>>> Could you elaborate on the 'get what you pay for' usecase you
>>>> mentioned?
>>>
>
>--
>Best Regards,
>Huang, Ying
--
Best Regards,
Ravi Jonnalagadda