Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory

From: Honggyu Kim
Date: Wed Jan 17 2024 - 06:50:33 EST


Hi SeongJae,

Thanks very much for your comments in details.

On Tue, 16 Jan 2024 12:31:59 -0800 SeongJae Park <sj@xxxxxxxxxx> wrote:

> Thank you so much for this great patches and the above nice test results. I
> believe the test setup and results make sense, and merging a revised version of
> this patchset would provide real benefits to the users.

Glad to hear that!

> In a high level, I think it might better to separate DAMON internal changes
> from DAMON external changes.

I agree. I can't guarantee but I can move all the external changes
inside mm/damon, but will try that as much as possible.

> For DAMON part changes, I have no big concern other than trivial coding style
> level comments.

Sure. I will fix those.

> For DAMON-external changes that implementing demote_pages() and
> promote_pages(), I'm unsure if the implementation is reusing appropriate
> functions, and if those are placee in right source file. Especially, I'm
> unsure if vmscan.c is the right place for promotion code. Also I don't know if
> there is a good agreement on the promotion/demotion target node decision. That
> should be because I'm not that familiar with the areas and the files, but I
> feel this might because our discussions on the promotion and the demotion
> operations are having rooms for being more matured. Because I'm not very
> faimiliar with the part, I'd like to hear others' comments, too.

I would also like to hear others' comments, but this might not be needed
if most of external code can be moved to mm/damon.

> To this end, I feel the problem might be able te be simpler, because this
> patchset is trying to provide two sophisticated operations, while I think a
> simpler approach might be possible. My humble simpler idea is adding a DAMOS
> operation for moving pages to a given node (like sys_move_phy_pages RFC[1]),
> instead of the promote/demote. Because the general pages migration can handle
> multiple cases including the promote/demote in my humble assumption.

My initial implementation was similar but I found that it's not accurate
enough due to the nature of inaccuracy of DAMON regions. I saw that
many pages were demoted and promoted back and forth because migration
target regions include both hot and cold pages together.

So I have implemented the demotion and promotion logics based on the
shrink_folio_list, which contains many corner case handling logics for
reclaim.

Having the current demotion and promotion logics makes the hot/cold
migration pretty accurate as expected. We made a simple program called
"hot_cold" and it receives 2 arguments for hot size and cold size in MB.
For example, "hot_cold 200 500" allocates 200MB of hot memory and 500MB
of cold memory. It basically allocates 2 large blocks of memory with
mmap, then repeat memset for the initial 200MB to make it accessed in an
infinite loop.

Let's say there are 3 nodes in the system and the first node0 and node1
are the first tier, and node2 is the second tier.

$ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
0-1

$ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist
2

Here is the result of partitioning hot/cold memory and I put execution
command at the right side of numastat result. I initially ran each
hot_cold program with preferred setting so that they initially allocate
memory on one of node0 or node2, but they gradually migrated based on
their access frequencies.

$ numastat -c -p hot_cold
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Total
--------------- ------ ------ ------ -----
754 (hot_cold) 1800 0 2000 3800 <- hot_cold 1800 2000
1184 (hot_cold) 300 0 500 800 <- hot_cold 300 500
1818 (hot_cold) 801 0 3199 4000 <- hot_cold 800 3200
30289 (hot_cold) 4 0 5 10 <- hot_cold 3 5
30325 (hot_cold) 31 0 51 81 <- hot_cold 30 50
--------------- ------ ------ ------ -----
Total 2938 0 5756 8695

The final node placement result shows that DAMON accurately migrated
pages by their hotness for multiple processes.

> In more detail, users could decide which is the appropriate node for promotion
> or demotion and use the new DAMOS action to do promotion and demotion. Users
> would requested to decide which node is the proper promotion/demotion target
> nodes, but that decision wouldn't be that hard in my opinion.
>
> For this, 'struct damos' would need to be updated for such argument-dependent
> actions, like 'struct damos_filter' is haing a union.

That might be a better solution. I will think about it.

> In future, we could extend the operation to the promotion and the demotion
> after the dicussion around the promotion and demotion is matured, if required.
> And assuming DAMON be extended for originating CPU-aware access monitoring, the
> new DAMOS action would also cover more use cases such as general NUMA nodes
> balancing (extending DAMON for CPU-aware monitoring would required), and some
> complex configurations where having both CPU affinity and tiered memory. I
> also think that may well fit with my RFC idea[2] for tiered memory management.
>
> Looking forward to opinions from you and others. I admig I miss many things,
> and more than happy to be enlightened.
>
> [1] https://lwn.net/Articles/944007/
> [2] https://lore.kernel.org/damon/20231112195602.61525-1-sj@xxxxxxxxxx/

Thanks very much for your comments. I will need a few more days for the
update but will try to address your concerns as much as possible.

Thanks,
Honggyu