Re: [RFC PATCH 0/3] support large folio for mlock

From: Ryan Roberts
Date: Mon Jul 10 2023 - 06:36:53 EST


On 07/07/2023 20:26, Matthew Wilcox wrote:
> On Fri, Jul 07, 2023 at 09:15:02PM +0200, David Hildenbrand wrote:
>>>> Sure, any time we PTE-map a THP we might just say "let's put that on the
>>>> deferred split queue" and cross fingers that we can eventually split it
>>>> later. (I was recently thinking about that in the context of the mapcount
>>>> ...)
>>>>
>>>> It's all a big mess ...
>>>
>>> Oh, I agree, there are always going to be circumstances where we realise
>>> we've made a bad decision and can't (easily) undo it. Unless we have a
>>> per-page pincount, and I Would Rather Not Do That.
>>
>> I agree ...
>>
>> But we should _try_
>>> to do that because it's the right model -- that's what I meant by "Tell
>>
>> Try to have per-page pincounts? :/ or do you mean, try to split on VMA
>> split? I hope the latter (although I'm not sure about performance) :)
>
> Sorry, try to split a folio on VMA split.
>
>>> me why I'm wrong"; what scenarios do we have where a user temporarilly
>>> mlocks (or mprotects or ...) a range of memory, but wants that memory
>>> to be aged in the LRU exactly the same way as the adjacent memory that
>>> wasn't mprotected?
>>
>> Let me throw in a "fun one".
>>
>> Parent process has a 2 MiB range populated by a THP. fork() a child process.
>> Child process mprotects half the VMA.
>>
>> Should we split the (COW-shared) THP? Or should we COW/unshare in the child
>> process (ugh!) during the VMA split.
>>
>> It all makes my brain hurt.
>
> OK, so this goes back to what I wrote earlier about attempting to choose
> what size of folio to allocate on COW:
>
> https://lore.kernel.org/linux-mm/Y%2FU8bQd15aUO97vS@xxxxxxxxxxxxxxxxxxxx/
>
> : the parent had already established
> : an appropriate size folio to use for this VMA before calling fork().
> : Whether it is the parent or the child causing the COW, it should probably
> : inherit that choice and we should default to the same size folio that
> : was already found.

FWIW, I had patches in my original RFC that aimed to follow this policy for
large anon folios [1] & [2], and intend to follow up with a modified version of
these patches once we have an initial submission.

[1] https://lore.kernel.org/linux-mm/20230414130303.2345383-11-ryan.roberts@xxxxxxx/
[2] https://lore.kernel.org/linux-mm/20230414130303.2345383-15-ryan.roberts@xxxxxxx/