mbind MPOL_INTERLEAVE existing pages

From: Mike Kravetz
Date: Mon May 01 2023 - 14:59:04 EST


I received a question from a customer that was trying to move pages via
the mbind system call. In this specific case, the system had two nodes
and all pages in the range were already present on node 0. They then
called mbind with mode MPOL_INTERLEAVE and the MPOL_MF_MOVE_ALL flag. Their
expectation was that half the pages in the range would be moved to node 1
in an interleaved pattern.

In the above situation, no pages actually get moved. This is because mbind
creates a list of pages to be moved via:

ret = queue_pages_range(mm, start, end, nmask,
flags | MPOL_MF_INVERT, &pagelist);

No page will be added to the list as queue_folio_required is called for each
page to determine if it resides within the set of nodes. And, all page are
within the set.

I have reread the mbind man page several times and agree that one might
expect MPOL_INTERLEAVE with MPOL_MF_MOVE_ALL to move pages and create an
interleaved pattern. My question is should we:
- Change mbind so that pages are moved to an interleaved pattern?
- Update the documentation to be more explicit?

I can do either, but just wanted to get opinions before starting.
--
Mike Kravetz