Re: [PATCH v2 0/5] variable-order, large folios for anonymous memory

From: David Hildenbrand
Date: Fri Jul 07 2023 - 07:41:42 EST


On 06.07.23 10:02, Ryan Roberts wrote:
On 05/07/2023 20:38, David Hildenbrand wrote:
On 03.07.23 15:53, Ryan Roberts wrote:
Hi All,

This is v2 of a series to implement variable order, large folios for anonymous
memory. The objective of this is to improve performance by allocating larger
chunks of memory during anonymous page faults. See [1] for background.


[...]

Thanks,
Ryan

Hi Ryan,

is page migration already working as expected (what about page compaction?), and
do we handle migration -ENOMEM when allocating a target page: do we split an
fallback to 4k page migration?


Hi David, All,

Hi Ryan,

thanks a lot for the list.

But can you comment on the page migration part (IOW did you try it already)?

For example, memory hotunplug, CMA, MCE handling, compaction all rely on page migration of something that was allocated using GFP_MOVABLE to actually work.

Compaction seems to skip any higher-order folios, but the question is if the udnerlying migration itself works.

If it already works: great! If not, this really has to be tackled early, because otherwise we'll be breaking the GFP_MOVABLE semantics.


This series aims to be the bare minimum to demonstrate allocation of large anon
folios. As such, there is a laundry list of things that need to be done for this
feature to play nicely with other features. My preferred route is to merge this
with it's Kconfig defaulted to disabled, and its Kconfig description clearly
shouting that it's EXPERIMENTAL with an explanation of why (similar to
READ_ONLY_THP_FOR_FS).
As long as we are not sure about the user space control and as long as basic functionality is not working (example, page migration), I would tend to not merge this early just for the sake of it.

But yes, something like mlock can eventually be tackled later: as long as there is a runtime interface to disable it ;)


That said, I've put together a table of the items that I'm aware of that need
attention. It would be great if people can review and add any missing items.
Then we can hopefully parallelize the implementation work. David, I don't think
the items you raised are covered - would you mind providing a bit more detail so
I can add them to the list? (or just add them to the list yourself, if you prefer).

---

- item:
mlock

description: >-
Large, pte-mapped folios are ignored when mlock is requested. Code comment
for mlock_vma_folio() says "...filter out pte mappings of THPs, which
cannot be consistently counted: a pte mapping of the THP head cannot be
distinguished by the page alone."

location:
- mlock_pte_range()
- mlock_vma_folio()

assignee:
Yin, Fengwei


- item:
numa balancing

description: >-
Large, pte-mapped folios are ignored by numa-balancing code. Commit
comment (e81c480): "We're going to have THP mapped with PTEs. It will
confuse numabalancing. Let's skip them for now."

location:
- do_numa_page()

assignee:
<none>


- item:
madvise

description: >-
MADV_COLD, MADV_PAGEOUT, MADV_FREE: For large folios, code assumes
exclusive only if mapcount==1, else skips remainder of operation. For
large, pte-mapped folios, exclusive folios can have mapcount upto nr_pages
and still be exclusive. Even better; don't split the folio if it fits
entirely within the range? Discussion at

https://lore.kernel.org/linux-mm/6cec6f68-248e-63b4-5615-9e0f3f819a0a@xxxxxxxxxx/
talks about changing folio mapcounting - may help determine if exclusive
without pgtable scan?

location:
- madvise_cold_or_pageout_pte_range()
- madvise_free_pte_range()

assignee:
<none>


- item:
shrink_folio_list

description: >-
Raised by Yu Zhao; I can't see the problem in the code - need
clarification

location:
- shrink_folio_list()

assignee:
<none>


- item:
compaction

description: >-
Raised at LSFMM: Compaction skips non-order-0 pages. Already problem for
page-cache pages today. Is my understand correct?

location:
- <where?>

assignee:
<none>

I'm still thinking about the whole mapcount thingy (and I burned way too much time on that yesterday), which is a big item for such a list and affects some of these items.

A pagetable scan is pretty much irrelevant for order-2 pages. But once we're talking about higher orders we really don't want to do that.

I'm preparing a writeup with users and challenges.


Is swapping working as expected? zswap?

--
Cheers,

David / dhildenb