Re: [PATCH 2/2] mm/damon: Add a new scheme to support demotion on tiered memory system

From: SeongJae Park
Date: Wed Dec 22 2021 - 03:43:30 EST


On Tue, 21 Dec 2021 22:18:06 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote:

> Hi SeongJae,
>
> On 12/21/2021 9:24 PM, SeongJae Park Wrote:
> > Hi Baolin,
> >
> >
> > Thank you for this great patch! I left some comments below.
> >
> > On Tue, 21 Dec 2021 17:18:04 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote:
> >
> >> On tiered memory system, the reclaim path in shrink_page_list() already
> >> support demoting pages to slow memory node instead of discarding the
> >> pages. However, at that time the fast memory node memory wartermark is
> >> already tense, which will increase the memory allocation latency during
> >> page demotion.
> >>
> >> We can rely on the DAMON in user space to help to monitor the cold
> >> memory on fast memory node, and demote the cold pages to slow memory
> >> node proactively to keep the fast memory node in a healthy state.
> >> Thus this patch introduces a new scheme named DAMOS_DEMOTE to support
> >> this feature.
> >>
> >> Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
> >> ---
[...]
> >> + */
> >> +static int damos_demote(struct damon_target *target, struct damon_region *r)
> >> +{
> >> + struct mm_struct *mm;
> >> + LIST_HEAD(demote_pages);
> >> + LIST_HEAD(pagelist);
> >> + int target_nid, nr_pages, ret = 0;
> >> + unsigned int nr_succeeded, demoted_pages = 0;
> >> + struct page *page, *page2;
> >> +
> >> + /* Validate if allowing to do page demotion */
> >> + if (!numa_demotion_enabled)
> >> + return -EINVAL;
> >> +
> >> + mm = damon_get_mm(target);
> >> + if (!mm)
> >> + return -ENOMEM;
> >> +
> >> + mmap_read_lock(mm);
> >> + walk_page_range(mm, PAGE_ALIGN(r->ar.start), PAGE_ALIGN(r->ar.end),
[...]
> >> + &damos_migrate_pages_walk_ops, &demote_pages);
> >> + mmap_read_unlock(mm);
> >> +
> >> + mmput(mm);
> >> + if (list_empty(&demote_pages))
> >> + return 0;
> >> +
> >> + list_for_each_entry_safe(page, page2, &demote_pages, lru) {
> >
> > I'd prefer 'next' or 'next_page' instead of 'page2'.
>
> Sure.
>
> >
> >> + list_add(&page->lru, &pagelist);
> >> + target_nid = next_demotion_node(page_to_nid(page));
> >> + nr_pages = thp_nr_pages(page);
> >> +
> >> + ret = migrate_pages(&pagelist, alloc_demote_page, NULL,
> >> + target_nid, MIGRATE_SYNC, MR_DEMOTION,
> >> + &nr_succeeded);
> >> + if (ret) {
> >> + if (!list_empty(&pagelist)) {
> >> + list_del(&page->lru);
> >> + mod_node_page_state(page_pgdat(page),
> >> + NR_ISOLATED_ANON + page_is_file_lru(page),
> >> + -nr_pages);
> >> + putback_lru_page(page);
> >> + }
> >> + } else {
> >> + __count_vm_events(PGDEMOTE_DIRECT, nr_succeeded);
> >> + demoted_pages += nr_succeeded;
> >> + }
> >
> > Why should we do above work for each page on our own instead of exporting and
> > calling 'demote_page_list()'?
>
> Cuase demote_page_list() is used for demote cold pages which are from
> one same fast memory node, they can have one same target demotion node.
>
> But for the regions for the process, we can get some cold pages from
> different fast memory nodes (suppose we have mulptiple dram node on the
> system), so its target demotion node may be different. Thus we should
> migrate each cold pages with getting the correct target demotion node.

Thank you for the kind explanation. But, I still want to reuse the code rather
than copying if possible. How about below dumb ideas off the top of my head?

1. Call demote_page_list() for each page from here
2. Call demote_page_list() for each page from damos_migrate_pmd_entry()
3. Make damos_migrate_pages_walk_ops to configure multiple demote_pages lists,
each per fast memory node, and calling demote_page_list() here for each
per-fast-memory-node demote_pages list.
4. Make demote_page_list() handles this case and use it. e.g., NULL pgdat
parameter means the pages are not from same node.

Please let me know if I'm missing something.


Thanks,
SJ

>
> Thanks for your comments.