[PATCH 0/2] thp nodereclaim fixes

From: Michal Hocko
Date: Tue Sep 25 2018 - 08:03:46 EST


Hi,
this has been brought up by Andrea [1] and he proposed two different
fixes for the regression. I have proposed an alternative fix [2]. I have
changed my mind in the end because whatever fix we end up with it should
be backported to the stable trees so going with a minimalistic one is
preferred so I have got back to the Andrea's second proposed solution
[3] in the end. I have just reworded the changelog to reflect other bug
report with the stall information.

My primary concern about [3] was that the __GFP_THISNODE logic should be
placed in alloc_hugepage_direct_gfpmask which I've done on top of the
fix as a cleanup (patch 2) and it doesn't need to be backported to the
stable tree.

I am still not happy that the David's workload will regress as a result
but we should really focus on the default behavior and come with a more
robust solution for specialized one for those who have more restrictive
NUMA preferences. I am thinking about a new numa policy that would mimic
node reclaim behavior and I am willing to work on that but we really
have to fix the regression first and that is the patch 1.

Thoughts, alternative patches?

[1] http://lkml.kernel.org/r/20180820032204.9591-1-aarcange@xxxxxxxxxx
[2] http://lkml.kernel.org/r/20180830064732.GA2656@xxxxxxxxxxxxxx
[3] http://lkml.kernel.org/r/20180820032640.9896-2-aarcange@xxxxxxxxxx