Re: [RFC PATCH 0/2] mm: mempolicy: Multi-tier interleaving

From: Huang, Ying
Date: Thu Sep 28 2023 - 02:19:12 EST


Hi, Ravi,

Thanks for the patch!

Ravi Jonnalagadda <ravis.opensrc@xxxxxxxxxx> writes:

> From: Ravi Shankar <ravis.opensrc@xxxxxxxxxx>
>
> Hello,
>
> The current interleave policy operates by interleaving page requests
> among nodes defined in the memory policy. To accommodate the
> introduction of memory tiers for various memory types (e.g., DDR, CXL,
> HBM, PMEM, etc.), a mechanism is needed for interleaving page requests
> across these memory types or tiers.

Why do we need interleaving page allocation among memory tiers? I think
that you need to make it more explicit. I guess that it's to increase
maximal memory bandwidth for workloads?

> This can be achieved by implementing an interleaving method that
> considers the tier weights.
> The tier weight will determine the proportion of nodes to select from
> those specified in the memory policy.
> A tier weight can be assigned to each memory type within the system.

What is the problem of the original interleaving? I think you need to
make it explicit too.

> Hasan Al Maruf had put forth a proposal for interleaving between two
> tiers, namely the top tier and the low tier. However, this patch was
> not adopted due to constraints on the number of available tiers.
>
> https://lore.kernel.org/linux-mm/YqD0%2FtzFwXvJ1gK6@xxxxxxxxxxx/T/
>
> New proposed changes:
>
> 1. Introducea sysfs entry to allow setting the interleave weight for each
> memory tier.
> 2. Each tier with a default weight of 1, indicating a standard 1:1
> proportion.
> 3. Distribute the weight of that tier in a uniform manner across all nodes.
> 4. Modifications to the existing interleaving algorithm to support the
> implementation of multi-tier interleaving based on tier-weights.
>
> This is inline with Huang, Ying's presentation in lpc22, 16th slide in
> https://lpc.events/event/16/contributions/1209/attachments/1042/1995/\
> Live%20In%20a%20World%20With%20Multiple%20Memory%20Types.pdf

Thanks to refer to the original work about this.

> Observed a significant increase (165%) in bandwidth utilization
> with the newly proposed multi-tier interleaving compared to the
> traditional 1:1 interleaving approach between DDR and CXL tier nodes,
> where 85% of the bandwidth is allocated to DDR tier and 15% to CXL
> tier with MLC -w2 option.

It appears that "mlc" isn't an open source software. Better to use a
open source software to test. And, even better to use a more practical
workloads instead of a memory bandwidth/latency measurement tool.

> Usage Example:
>
> 1. Set weights for DDR (tier4) and CXL(teir22) tiers.
> echo 85 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight
> echo 15 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight
>
> 2. Interleave between DRR(tier4, node-0) and CXL (tier22, node-1) using numactl
> numactl -i0,1 mlc --loaded_latency W2
>
> Srinivasulu Thanneeru (2):
> memory tier: Introduce sysfs for tier interleave weights.
> mm: mempolicy: Interleave policy for tiered memory nodes
>
> include/linux/memory-tiers.h | 27 ++++++++-
> include/linux/sched.h | 2 +
> mm/memory-tiers.c | 67 +++++++++++++++-------
> mm/mempolicy.c | 107 +++++++++++++++++++++++++++++++++--
> 4 files changed, 174 insertions(+), 29 deletions(-)

--
Best Regards,
Huang, Ying