Re: [PATCH RFC 3/4] mm: move lazy free pages to inactive list

From: Michal Hocko
Date: Tue Feb 24 2015 - 11:14:15 EST

Next message: Greg Rose: "Re: [PATCH 0/8] Adds Intel FieldsPeak NFC solution driver"
Previous message: Steven Rostedt: "Re: [tip:sched/core] sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target"
In reply to: Minchan Kim: "[PATCH RFC 3/4] mm: move lazy free pages to inactive list"
Next in thread: Minchan Kim: "Re: [PATCH RFC 3/4] mm: move lazy free pages to inactive list"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue 24-02-15 17:18:16, Minchan Kim wrote:
> MADV_FREE is hint that it's okay to discard pages if memory is
> pressure and we uses reclaimers(ie, kswapd and direct reclaim)

s@if memory is pressure@if there is memory pressure@

> to free them so there is no worth to remain them in active
> anonymous LRU list so this patch moves them to inactive LRU list.

Makes sense to me.

> A arguable issue for the approach is whether we should put it
> head or tail in inactive list

Is it really arguable? Why should active MADV_FREE pages appear before
those which were living on the inactive list. This doesn't make any
sense to me.

> and selected it as head because
> kernel cannot make sure it's really cold or warm for every usecase
> but at least we know it's not hot so landing of inactive head
> would be comprimise if it stayed in active LRU.

This is really hard to read. What do you think about the following
wording?
"
The active status of those pages is cleared and they are moved to the
head of the inactive LRU. This means that MADV_FREE-ed pages which
were living on the inactive list are reclaimed first because they
are more likely to be cold rather than recently active pages.
"

> As well, if we put recent hinted pages to inactive's tail,
> VM could discard cache hot pages, which would be bad.
>
> As a bonus, we don't need to move them back and forth in inactive
> list whenever MADV_SYSCALL syscall is called.
>
> As drawback, VM should scan more pages in inactive anonymous LRU
> to discard but it has happened all the time if recent reference
> happens on those pages in inactive LRU list so I don't think
> it's not a main drawback.

Rather than the above paragraphs I would like to see a description why
this is needed. Something like the following?
"
This is fixing a suboptimal behavior of MADV_FREE when pages living on
the active list will sit there for a long time even under memory
pressure while the inactive list is reclaimed heavily. This basically
breaks the whole purpose of using MADV_FREE to help the system to free
memory which is might not be used.
"

> Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>

Other than that the patch looks good to me.
Acked-by: Michal Hocko <mhocko@xxxxxxx>

> ---
> include/linux/swap.h | 1 +
> mm/madvise.c | 2 ++
> mm/swap.c | 35 +++++++++++++++++++++++++++++++++++
> 3 files changed, 38 insertions(+)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index cee108cbe2d5..0428e4c84e1d 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -308,6 +308,7 @@ extern void lru_add_drain_cpu(int cpu);
> extern void lru_add_drain_all(void);
> extern void rotate_reclaimable_page(struct page *page);
> extern void deactivate_file_page(struct page *page);
> +extern void deactivate_page(struct page *page);
> extern void swap_setup(void);
>
> extern void add_page_to_unevictable_list(struct page *page);
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 81bb26ecf064..6176039c69e4 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -324,6 +324,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
> ptent = pte_mkold(ptent);
> ptent = pte_mkclean(ptent);
> set_pte_at(mm, addr, pte, ptent);
> + if (PageActive(page))
> + deactivate_page(page);
> tlb_remove_tlb_entry(tlb, pte, addr);
> }
> arch_leave_lazy_mmu_mode();
> diff --git a/mm/swap.c b/mm/swap.c
> index 5b2a60578f9c..393968c33667 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -43,6 +43,7 @@ int page_cluster;
> static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
> static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
> static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs);
> +static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
>
> /*
> * This path almost never happens for VM activity - pages are normally
> @@ -789,6 +790,23 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
> update_page_reclaim_stat(lruvec, file, 0);
> }
>
> +
> +static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
> + void *arg)
> +{
> + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
> + int file = page_is_file_cache(page);
> + int lru = page_lru_base_type(page);
> +
> + del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE);
> + ClearPageActive(page);
> + add_page_to_lru_list(page, lruvec, lru);
> +
> + __count_vm_event(PGDEACTIVATE);
> + update_page_reclaim_stat(lruvec, file, 0);
> + }
> +}
> +
> /*
> * Drain pages out of the cpu's pagevecs.
> * Either "cpu" is the current CPU, and preemption has already been
> @@ -815,6 +833,10 @@ void lru_add_drain_cpu(int cpu)
> if (pagevec_count(pvec))
> pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
>
> + pvec = &per_cpu(lru_deactivate_pvecs, cpu);
> + if (pagevec_count(pvec))
> + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
> +
> activate_page_drain(cpu);
> }
>
> @@ -844,6 +866,18 @@ void deactivate_file_page(struct page *page)
> }
> }
>
> +void deactivate_page(struct page *page)
> +{
> + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
> + struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
> +
> + page_cache_get(page);
> + if (!pagevec_add(pvec, page))
> + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
> + put_cpu_var(lru_deactivate_pvecs);
> + }
> +}
> +
> void lru_add_drain(void)
> {
> lru_add_drain_cpu(get_cpu());
> @@ -873,6 +907,7 @@ void lru_add_drain_all(void)
> if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) ||
> pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) ||
> pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) ||
> + pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) ||
> need_activate_page_drain(cpu)) {
> INIT_WORK(work, lru_add_drain_per_cpu);
> schedule_work_on(cpu, work);
> --
> 1.9.1
>

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Greg Rose: "Re: [PATCH 0/8] Adds Intel FieldsPeak NFC solution driver"
Previous message: Steven Rostedt: "Re: [tip:sched/core] sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target"
In reply to: Minchan Kim: "[PATCH RFC 3/4] mm: move lazy free pages to inactive list"
Next in thread: Minchan Kim: "Re: [PATCH RFC 3/4] mm: move lazy free pages to inactive list"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]