Re: [PATCH mm-unstable v1] mm: add a total mapcount for large folios

From: David Hildenbrand
Date: Fri Aug 11 2023 - 11:17:56 EST


On 11.08.23 17:03, Peter Xu wrote:
On Thu, Aug 10, 2023 at 11:59:25PM +0200, David Hildenbrand wrote:
On 10.08.23 23:54, Matthew Wilcox wrote:
On Thu, Aug 10, 2023 at 05:48:19PM -0400, Peter Xu wrote:
Yes, that comment from Hugh primarily discusses how we could possibly
optimize the loop, and if relying on folio_nr_pages_mapped() to reduce the
iterations would be racy. As far as I can see, there are cases where "it
would be certainly a bad idea" :)

Is the race described about mapcount being changed right after it's read?
Are you aware of anything specific that will be broken, and will be fixed
with this patch?

The problem is that people check the mapcount while holding no locks;
not the PTL, not the page lock. So it's an unfixable race.

Having a total mapcount does sound helpful if partial folio is common
indeed.

I'm curious whether that'll be so common after the large anon folio work -
isn't it be sad if partial folio will be a norm? It sounds to me that's
the case when small page sizes should be used.. and it's prone to waste?

The problem is that entire_mapcount isn't really entire_mapcount.
It's pmd_mapcount. I have had thoughts about using it as entire_mapcount,
but it gets gnarly when people do partial unmaps. So the _usual_ case
ends up touching every struct page. Which sucks. Also it's one of the
things which stands in the way of shrinking struct page.

Right, so one current idea is to have a single total_mapcount and look into
removing the subpage mapcounts (which will require first removing
_nr_pages_mapped, because that's still one of the important users).

Until we get there, also rmap code has to do eventually "more tracking" and
might, unfortunately, end up slower.


But it's kind of annoying to explain all of this to you individually.
There have been hundreds of emails about it over the last months on
this mailing list. It would be nice if you could catch up instead of
jumping in.

To be fair, a lot of the details are not readily available and in the heads
of selected people :)

Peter, if you're interested, we can discuss the current plans, issues and
ideas offline!

Thanks for offering help, David.

Personally I still am unclear yet on why entire_mapcount cannot be used as
full-folio mapcounts, and why "partial unmap" can happen a lot (I don't
expect), but yeah I can try to catch up to educate myself first.

Using fork() is the easiest way. mremap(), MADV_DONTNEED, munmap, ...

You might end up having to scan page tables and/or the rmap to figure out which mapcount to adjust, which we should absolutely avoid.


The only issue regarding an offline sync-up is that even if David will help
Peter on catching up the bits, it'll not scale when another Peter2 had the
same question.. So David, rather than I waste your time on helping one
person, let me try to catch up with the public threads - I'm not sure how
far I can go myself;

Sure. But note that it's a moving target, and some discussions have been going on for a long time. I recall there were various discussions, including LSF/MM, mm biweekly meeting, and more. So even if you scan through all that, you might either get outdated or incomplete information.

--
Cheers,

David / dhildenb