Re: [PATCH] mm/mempolicy: Fix an incorrect rebind node in mpol_rebind_nodemask

From: zhong jiang
Date: Mon May 27 2019 - 10:01:39 EST


On 2019/5/27 20:23, Vlastimil Babka wrote:
> On 5/25/19 8:28 PM, Andrew Morton wrote:
>> (Cc Vlastimil)
> Oh dear, 2 years and I forgot all the details about how this works.
>
>> On Sat, 25 May 2019 15:07:23 +0800 zhong jiang <zhongjiang@xxxxxxxxxx> wrote:
>>
>>> We bind an different node to different vma, Unluckily,
>>> it will bind different vma to same node by checking the /proc/pid/numa_maps.
>>> Commit 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets")
>>> has introduced the issue. when we change memory policy by seting cpuset.mems,
>>> A process will rebind the specified policy more than one times.
>>> if the cpuset_mems_allowed is not equal to user specified nodes. hence the issue will trigger.
>>> Maybe result in the out of memory which allocating memory from same node.
> I have a hard time understanding what the problem is. Could you please
> write it as a (pseudo) reproducer? I.e. an example of the process/admin
> mempolicy/cpuset actions that have some wrong observed results vs the
> correct expected result.
Sorry, I havn't an testcase to reproduce the issue. At first, It was disappeared by
my colleague to configure the xml to start an vm. To his suprise, The bind mempolicy
doesn't work.

Thanks,
zhong jiang
>>> --- a/mm/mempolicy.c
>>> +++ b/mm/mempolicy.c
>>> @@ -345,7 +345,7 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes)
>>> else {
>>> nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed,
>>> *nodes);
>>> - pol->w.cpuset_mems_allowed = tmp;
>>> + pol->w.cpuset_mems_allowed = *nodes;
> Looks like a mechanical error on my side when removing the code for
> step1+step2 rebinding. Before my commit there was
>
> pol->w.cpuset_mems_allowed = step ? tmp : *nodes;
>
> Since 'step' was removed and thus 0, I should have used *nodes indeed.
> Thanks for catching that.
>
>>> }
>>>
>>> if (nodes_empty(tmp))
>> hm, I'm not surprised the code broke. What the heck is going on in
>> there? It used to have a perfunctory comment, but Vlastimil deleted
>> it.
> Yeah the comment was specific for the case that was being removed.
>
>> Could someone please propose a comment for the above code block
>> explaining why we're doing what we do?
> I'll have to relearn this first...
>
>