Re: [RFC PATCH] mm: support large folio numa balancing

From: Baolin Wang
Date: Tue Nov 14 2023 - 05:53:53 EST




On 11/13/2023 10:49 PM, David Hildenbrand wrote:
On 13.11.23 13:59, Baolin Wang wrote:


On 11/13/2023 6:53 PM, David Hildenbrand wrote:
On 13.11.23 11:45, Baolin Wang wrote:
Currently, the file pages already support large folio, and supporting for
anonymous pages is also under discussion[1]. Moreover, the numa balancing
code are converted to use a folio by previous thread[2], and the
migrate_pages
function also already supports the large folio migration.

So now I did not see any reason to continue restricting NUMA balancing
for
large folio.

I recall John wanted to look into that. CCing him.

I'll note that the "head page mapcount" heuristic to detect sharers will
now strike on the PTE path and make us believe that a large folios is
exclusive, although it isn't.

As spelled out in the commit you are referencing:

commit 6695cf68b15c215d33b8add64c33e01e3cbe236c
Author: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx>
Date:   Thu Sep 21 15:44:14 2023 +0800

      mm: memory: use a folio in do_numa_page()
      Numa balancing only try to migrate non-compound page in
do_numa_page(),
      use a folio in it to save several compound_head calls, note we use
      folio_estimated_sharers(), it is enough to check the folio sharers
since
      only normal page is handled, if large folio numa balancing is
supported, a
      precise folio sharers check would be used, no functional change
intended.

Thanks for pointing out the part I missed.

I saw the migrate_pages() syscall is also using
folio_estimated_sharers() to check if the folio is shared, and I wonder
it will bring about any significant issues?

It's now used all over the place, in some places for making manual decisions (e.g., MADV_PAGEOUT works although it shouldn't) and more and more automatic places (e.g., the system ends up migrating a folio although it shouldn't). The nasty thing about it is that it doesn't give you "certainly exclusive" vs. "maybe shared" but "maybe exclusive" vs. "certainly shared".

IIUC, the side effect could be that we migrate folios because we assume they are exclusive even though they are actually shared. Right now, it's sufficient to not have the first page of the folio mapped anymore for that to happen.

Yes.

Anyhow, it's worth mentioning that in the commit message as long as we have no better solution for that. For many cases it might be just tolerable.

Agree. The 'maybe shared' folio may affect the numa group statistics, which is used to accumulate the numa faults in one group to choose a prefered node for the tasks. For this case, it may be tolerable too, but I have no performance numbers now. Let me think about it.

I'll send WIP patches for one approach that can improve the situation
soonish.

Great. Look forward to seeing this:)

I'm still trying to evaluate the performance hit of the additional tracking ... turns out there is no such thing as free food ;)

Make sense.