Re: mbind() breaks its API definition since v5.2 by commit d883544515aa (mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified)

From: Yang Shi
Date: Wed Oct 30 2019 - 00:33:10 EST




On 10/29/19 8:12 PM, Li Xinhai wrote:
On 2019-10-30Âat 10:50ÂYang ShiÂwrote:

On 10/29/19 7:27 PM, Li Xinhai wrote:
One change in do_mbind() of this commit has suspicious usage of return value of
queue_pages_range(), excerpt as below:

---
@@ -1243,10 +1265,15 @@ static long do_mbind(unsigned long start, unsigned long len,
  if (err)
  goto mpol_out;
- err = queue_pages_range(mm, start, end, nmask,
+ ret = queue_pages_range(mm, start, end, nmask,
  Âflags | MPOL_MF_INVERT, &pagelist);
- if (!err)
- err = mbind_range(mm, start, end, new);
+
+ if (ret < 0) { Â Â Â/////// convert to all possible 'ret' to '-EIO' <<<<
+ err = -EIO;
+ goto up_out;
+ }
+
+ err = mbind_range(mm, start, end, new);
  if (!err) {
  int nr_failed = 0;
---

Note that insideÂqueue_pages_range(), the call toÂwalk_page_range() may return
errors from 'test_walk' of 'struct mm_walk_ops', e.g. -EFAULT. Now, those error
codes are no longer reported to user space application.

 From user space, the mbind() call need to reported error, with EFAULT, as example:
EFAULT
Part or all of the memory range specified by nodemask and maxnode points
outside your accessible address space. Or, there was an unmapped hole in the
specified memory range specified by addr and len.
Thanks for catching this. That commit was aimed to correct the return
values for some corner cases in mbind(), but it should not alter the
errno for other failure cases, i.e. -EFAULT.

Could you please try the below patch (build test only)?

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4ae967b..99df43a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1286,7 +1286,7 @@ static long do_mbind(unsigned long start, unsigned
long len,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ flags | MPOL_MF_INVERT, &pagelist);

ÂÂÂÂÂÂÂ if (ret < 0) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ err = -EIO;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ err = ret;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto up_out;
ÂÂÂÂÂÂÂ }


This seems do not work, because the 'pagelist' would have some pages queued
into it, need to put back those pages instead of return quickly.

So, we need to remove this page leak as well. <<<<<<

In my understanding, revert the changes as I quoted above may solve it, but not sure
the details aboutÂchanges at end of do_mbind(), should keep them at there without
further change?

Thanks for pointing this out. We don't have to revert this commit to handle the non-empty pagelist correctly. The simplest way is to just put those pages back and I'm supposed this is also the preferred way since mbind_range() is not called to really apply the policy so those pages should not be migrated.

The below patch should solve this:

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4ae967b..d80025c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1286,7 +1286,10 @@ static long do_mbind(unsigned long start, unsigned long len,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ flags | MPOL_MF_INVERT, &pagelist);

ÂÂÂÂÂÂÂ if (ret < 0) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ err = -EIO;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (!list_empty(&pagelist))
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ putback_movable_pages(&pagelist);
+
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ err = ret;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto up_out;
ÂÂÂÂÂÂÂ }


- Xinhai

Please correct me if this is the intended change(and will have updated API
definition), or something was misunderstood.

-Xinhai
>