[PATCH v2 0/2] Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy

From: Donet Tom
Date: Fri Mar 08 2024 - 10:17:35 EST


This patchset is to optimize the cross-socket memory access with
MPOL_PREFERRED_MANY policy.

To test this patch we ran the following test on a 3 node system.
Node 0 - 2GB - Tier 1
Node 1 - 11GB - Tier 1
Node 6 - 10GB - Tier 2

Below changes are made to memcached to set the memory policy,
It select Node0 and Node1 as preferred nodes.

#include <numaif.h>
#include <numa.h>

unsigned long nodemask;
int ret;

nodemask = 0x03;
ret = set_mempolicy(MPOL_PREFERRED_MANY | MPOL_F_NUMA_BALANCING,
&nodemask, 10);
/* If MPOL_F_NUMA_BALANCING isn't supported,
* fall back to MPOL_PREFERRED_MANY */
if (ret < 0 && errno == EINVAL){
printf("set mem policy normal\n");
ret = set_mempolicy(MPOL_PREFERRED_MANY, &nodemask, 10);
}
if (ret < 0) {
perror("Failed to call set_mempolicy");
exit(-1);
}

Test Procedure:
===============
1. Make sure memory tiring and demotion are enabled.
2. Start memcached.

# ./memcached -b 100000 -m 204800 -u root -c 1000000 -t 7
-d -s "/tmp/memcached.sock"

3. Run memtier_benchmark to store 3200000 keys.

#./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary
--threads=1 --pipeline=1 --ratio=1:0 --key-pattern=S:S --key-minimum=1
--key-maximum=3200000 -n allkeys -c 1 -R -x 1 -d 1024

4. Start a memory eater on node 0 and 1. This will demote all memcached
pages to node 6.
5. Make sure all the memcached pages got demoted to lower tier by reading
/proc/<memcaced PID>/numa_maps.

# cat /proc/2771/numa_maps
---
default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64
default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64
---

6. Kill memory eater.
7. Read the pgpromote_success counter.
8. Start reading the keys by running memtier_benchmark.

#./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary
--pipeline=1 --distinct-client-seed --ratio=0:3 --key-pattern=R:R
--key-minimum=1 --key-maximum=3200000 -n allkeys
--threads=64 -c 1 -R -x 6

9. Read the pgpromote_success counter.

Test Results:
=============
Without Patch
------------------
1. pgpromote_success before test
Node 0: pgpromote_success 11
Node 1: pgpromote_success 140974

pgpromote_success after test
Node 0: pgpromote_success 11
Node 1: pgpromote_success 140974

2. Memtier-benchmark result.
AGGREGATED AVERAGE RESULTS (6 runs)
==================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency
------------------------------------------------------------------
Sets 0.00 --- --- --- ---
Gets 305792.03 305791.93 0.10 0.18949 0.16700
Waits 0.00 --- --- --- ---
Totals 305792.03 305791.93 0.10 0.18949 0.16700

======================================
p99 Latency p99.9 Latency KB/sec
-------------------------------------
--- --- 0.00
0.44700 1.71100 11542.69
--- --- ---
0.44700 1.71100 11542.69

With Patch
---------------
1. pgpromote_success before test
Node 0: pgpromote_success 5
Node 1: pgpromote_success 89386

pgpromote_success after test
Node 0: pgpromote_success 57895
Node 1: pgpromote_success 141463

2. Memtier-benchmark result.
AGGREGATED AVERAGE RESULTS (6 runs)
====================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency
--------------------------------------------------------------------
Sets 0.00 --- --- --- ---
Gets 521942.24 521942.07 0.17 0.11459 0.10300
Waits 0.00 --- --- --- ---
Totals 521942.24 521942.07 0.17 0.11459 0.10300

=======================================
p99 Latency p99.9 Latency KB/sec
---------------------------------------
--- --- 0.00
0.23100 0.31900 19701.68
--- --- ---
0.23100 0.31900 19701.68


Test Result Analysis:
=====================
1. With patch we could observe pages are getting promoted.
2. Memtier-benchmark results shows that, with the patch,
performance has increased more than 50%.

Ops/sec without fix - 305792.03
Ops/sec with fix - 521942.24

Changes:
v2:
- Rebased on latest upstream (v6.8-rc7)
- Used 'numa_node_id()' to get the current execution node ID, Added
'lockdep_assert_held' to make sure that the 'mpol_misplaced()' is
called with ptl held.
- The migration condition has been updated; now, migration will only
occur if the execution node is present in the policy nodemask.

-v1: https://lore.kernel.org/linux-mm/9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@xxxxxxxxxxxxx/#t


Donet Tom (2):
mm/mempolicy: Use numa_node_id() instead of cpu_to_node()
mm/numa_balancing:Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy

include/linux/mempolicy.h | 5 +++--
mm/huge_memory.c | 2 +-
mm/internal.h | 2 +-
mm/memory.c | 8 +++++---
mm/mempolicy.c | 34 ++++++++++++++++++++++++++--------
5 files changed, 36 insertions(+), 15 deletions(-)

--
2.39.3