Re: [PATCH v2 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise_free

From: Ryan Roberts
Date: Mon Mar 11 2024 - 05:56:28 EST


[...]

>>>>> we don't want reclamation overhead later. and we want memories immediately
>>>>> available to others.
>>>>
>>>> But by that logic, you also don't want to leave the large folio partially mapped
>>>> all the way until the last subpage is CoWed. Surely you would want to reclaim it
>>>> when you reach partial map status?
>>>
>>> To some extent, I agree. But then we will have two many copies. The last
>>> subpage is small, and a safe place to copy instead.
>>>
>>> We actually had to tune userspace to decrease partial map as too much
>>> partial map both unfolded CONT-PTE and wasted too much memory. if a
>>> vma had too much partial map, we disabled mTHP on this VMA.
>>
>> I actually had a whacky idea around introducing selectable page size ABI
>> per-process that might help here. I know Android is doing work to make the
>> system 16K page compatible. You could run most of the system processes with 16K
>> ABI on top of 4K kernel. Then those processes don't even have the ability to
>> madvise/munmap/mprotect/mremap anything less than 16K alignment so that acts as
>> an anti-fragmentation mechanism while allowing non-16K capable processes to run
>> side-by-side. Just a passing thought...
>
> Right, this project faces a challenge in supporting legacy
> 4KiB-aligned applications.
> but I don't find it will be an issue to run 16KiB-aligned applications
> on a kernel whose
> page size is 4KiB.

Yes, agreed that a 16K-aligned (or 64K-aligned) app will work without issue on
4K kernel, but it will also use getpagesize() and know what the page size is.
I'm suggesting you could actually run these apps on a 4K kernel but with a 16K
ABI and potentially get close to the native 16K performance out of them. It's
just a thought though - I don't have any data that actually shows this is better
than just running on a 4K kernel with a 4K ABI, and using 16K or 64K mTHP
opportunistically.