Re: [PATCH v2 01/13] mm/munlock: delete page_mlock() and all its works

From: Vlastimil Babka
Date: Mon Feb 14 2022 - 05:45:04 EST


On 2/14/22 07:59, Hugh Dickins wrote:
> We have recommended some applications to mlock their userspace, but that
> turns out to be counter-productive: when many processes mlock the same
> file, contention on rmap's i_mmap_rwsem can become intolerable at exit: it
> is needed for write, to remove any vma mapping that file from rmap's tree;
> but hogged for read by those with mlocks calling page_mlock() (formerly
> known as try_to_munlock()) on *each* page mapped from the file (the
> purpose being to find out whether another process has the page mlocked,
> so therefore it should not be unmlocked yet).
>
> Several optimizations have been made in the past: one is to skip
> page_mlock() when mapcount tells that nothing else has this page
> mapped; but that doesn't help at all when others do have it mapped.
> This time around, I initially intended to add a preliminary search
> of the rmap tree for overlapping VM_LOCKED ranges; but that gets
> messy with locking order, when in doubt whether a page is actually
> present; and risks adding even more contention on the i_mmap_rwsem.
>
> A solution would be much easier, if only there were space in struct page
> for an mlock_count... but actually, most of the time, there is space for
> it - an mlocked page spends most of its life on an unevictable LRU, but
> since 3.18 removed the scan_unevictable_pages sysctl, that "LRU" has
> been redundant. Let's try to reuse its page->lru.
>
> But leave that until a later patch: in this patch, clear the ground by
> removing page_mlock(), and all the infrastructure that has gathered
> around it - which mostly hinders understanding, and will make reviewing
> new additions harder. Don't mind those old comments about THPs, they
> date from before 4.5's refcounting rework: splitting is not a risk here.
>
> Just keep a minimal version of munlock_vma_page(), as reminder of what it
> should attend to (in particular, the odd way PGSTRANDED is counted out of
> PGMUNLOCKED), and likewise a stub for munlock_vma_pages_range(). Move
> unchanged __mlock_posix_error_return() out of the way, down to above its
> caller: this series then makes no further change after mlock_fixup().
>
> After this and each following commit, the kernel builds, boots and runs;
> but with deficiencies which may show up in testing of mlock and munlock.
> The system calls succeed or fail as before, and mlock remains effective
> in preventing page reclaim; but meminfo's Unevictable and Mlocked amounts
> may be shown too low after mlock, grow, then stay too high after munlock:
> with previously mlocked pages remaining unevictable for too long, until
> finally unmapped and freed and counts corrected. Normal service will be
> resumed in "mm/munlock: mlock_pte_range() when mlocking or munlocking".

Great!

> Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>

Acked-by: Vlastimil Babka <vbabka@xxxxxxx>