Re: [RFC PATCH 0/3] support large folio for mlock

From: Matthew Wilcox
Date: Fri Jul 07 2023 - 13:27:03 EST


On Sat, Jul 08, 2023 at 12:52:18AM +0800, Yin Fengwei wrote:
> This series identified the large folio for mlock to two types:
> - The large folio is in VM_LOCKED VMA range
> - The large folio cross VM_LOCKED VMA boundary

This is somewhere that I think our fixation on MUST USE PMD ENTRIES
has led us astray. Today when the arguments to mlock() cross a folio
boundary, we split the PMD entry but leave the folio intact. That means
that we continue to manage the folio as a single entry on the LRU list.
But userspace may have no idea that we're doing this. It may have made
several calls to mmap() 256kB at once, they've all been coalesced into
a single VMA and khugepaged has come along behind its back and created
a 2MB THP. Now userspace calls mlock() and instead of treating that as
a hint that oops, maybe we shouldn't've done that, we do our utmost to
preserve the 2MB folio.

I think this whole approach needs rethinking. IMO, anonymous folios
should not cross VMA boundaries. Tell me why I'm wrong.