Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory

From: Reza Arbab
Date: Tue Sep 26 2017 - 14:35:36 EST


On Tue, Sep 26, 2017 at 04:32:41PM +0000, Michal Hocko wrote:
On Tue 26-09-17 09:47:10, Reza Arbab wrote:
On Tue, Sep 26, 2017 at 01:37:07PM +0000, Michal Hocko wrote:
> On Fri 22-09-17 15:13:56, Reza Arbab wrote:
> > The move_pages() syscall can be used to find the numa node where a page
> > currently resides. This is not working for device public memory pages,
> > which erroneously report -EFAULT (unmapped or zero page).
> >
> > Enable by adding a FOLL_DEVICE flag for follow_page(), which
> > move_pages() will use. This could be done unconditionally, but adding a
> > flag seems like a safer change.
>
> I do not understand purpose of this patch. What is the numa node of a
> device memory?

Well, using hmm_devmem_pages_create() it is added to this node:

nid = dev_to_node(device);
if (nid < 0)
nid = numa_mem_id();

OK, but do all the HMM devices have concept of NUMA affinity? From the
code you are pasting they do not have to...

I don't know the definitive answer here, but as Jerome said PCIE devices should, and we are heading that way with NVLink/CAPI as well. It seems the default is just the nearest node.

I understand it's minimally useful information to userspace, but the memory
does have a nid and move_pages() is supposed to be able to return what that
is. I ran into this using a testcase which tries to verify that user
addresses were correctly migrated to coherent device memory.

That said, I'm okay with dropping this if you don't think it's worthwhile.

I am just worried that we allow information which is not generally
sensible and I am also not sure what the userspace can actually do with
that information.

As mentioned, it is minimally useful, e.g. for verifying migration, so returning the nid seems sensible to me. Alternatively, we might at least change the documentation to say
-EFAULT
This is a zero page, a device page, or the memory area is not mapped by the process.
^^^^^^^^^^^^^

--
Reza Arbab