Re: [PATCH v3 4/5] mm: introduce MADV_PAGEOUT

From: Michal Hocko
Date: Tue Jul 09 2019 - 05:55:24 EST


On Thu 27-06-19 20:54:04, Minchan Kim wrote:
> When a process expects no accesses to a certain memory range
> for a long time, it could hint kernel that the pages can be
> reclaimed instantly but data should be preserved for future use.
> This could reduce workingset eviction so it ends up increasing
> performance.
>
> This patch introduces the new MADV_PAGEOUT hint to madvise(2)
> syscall. MADV_PAGEOUT can be used by a process to mark a memory
> range as not expected to be used for a long time so that kernel
> reclaims *any LRU* pages instantly. The hint can help kernel in
> deciding which pages to evict proactively.
>
> - man-page material
>
> MADV_PAGEOUT (since Linux x.x)
>
> Do not expect access in the near future so pages in the specified
> regions could be reclaimed instantly regardless of memory pressure.
> Thus, access in the range after successful operation could cause
> major page fault but never lose the up-to-date contents unlike
> MADV_DONTNEED.

> It works for only private anonymous mappings and
> non-anonymous mappings that belong to files that the calling process
> could successfully open for writing; otherwise, it could be used for
> sidechannel attack.

I would rephrase this way:
"
Pages belonging to a shared mapping are only processed if a write access
is allowed for the calling process.
"

I wouldn't really mention side channel attacks for a man page. You can
mention can_do_mincore check and the side channel prevention in the
changelog that is not aimed for the man page.

> MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or
> VM_PFNMAP pages.
>
> * v2
> * add comment about SWAP_CLUSTER_MAX - mhocko
> * add permission check to prevent sidechannel attack - mhocko
> * add man page stuff - dave
>
> * v1
> * change pte to old and rely on the other's reference - hannes
> * remove page_mapcount to check shared page - mhocko
>
> * RFC v2
> * make reclaim_pages simple via factoring out isolate logic - hannes
>
> * RFCv1
> * rename from MADV_COLD to MADV_PAGEOUT - hannes
> * bail out if process is being killed - Hillf
> * fix reclaim_pages bugs - Hillf
>
> Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
> ---


I am still not convinced about the SWAP_CLUSTER_MAX batching and the
udnerlying OOM argument. Is one pmd worth of pages really an OOM risk?
Sure you can have many invocations in parallel and that would add on
but the same might happen with SWAP_CLUSTER_MAX. So I would just remove
the batching for now and think of it only if we really see this being a
problem for real. Unless you feel really strong about this, of course.

Anyway the patch looks ok to me otherwise.

Acked-by: Michal Hocko <mhocko@xxxxxxx>
--
Michal Hocko
SUSE Labs