Re: [RFC 4/4] mm: Ignore cpuset enforcement when allocation flag has __GFP_THISNODE

From: Dave Hansen
Date: Wed Nov 30 2016 - 14:44:43 EST


On 11/30/2016 03:17 AM, Anshuman Khandual wrote:
> Right but what is the rationale behind this ? This what is in the in-code
> documentation for this function __cpuset_node_allowed().
>
> * GFP_KERNEL - any node in enclosing hardwalled cpuset ok
>
> If the allocation has requested GFP_KERNEL, should not it look for the
> entire system for memory ? Does cpuset still has to be enforced ?

Documentation/cgroup-v1/cpusets.txt explains it quite a bit.

>> What exactly are the kernel-internal places that need to allocate from
>> the coherent device node? When would this be done out of the context of
>> an application *asking* for memory in the new node?
>
> The primary user right now is a driver who wants to move around mapped
> pages of an application from system RAM to CDM nodes and back. If the
> application has requested for it though an ioctl(), during migration
> the destination pages will be allocated on the CDM *in* the task context.

Side note: uhh, so you're doing migrate_pages() through some kind of new
ioctl()? Why?

I think you're actually pointing out a hole in how cpusets currently
works, especially about the workqueue. I'm not quite sure if this is by
design for migrate_pages() (a task doing migrate_pages() can pages for a
task from a cpuset even though that task isn't able to allocate itself).

> The driver could also have scheduled migration chunks in the work queue
> which can execute later on. IIUC those execution and corresponding
> allocation into CDM node will be *out* of context of the task.

Yeah, the current->mems_allowed in __cpuset_node_allowed() does seem
rather wrong for something happening in another task's context.