Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD"

From: Thorsten Leemhuis
Date: Tue Nov 20 2012 - 12:43:01 EST


On 20.11.2012 16:38, Josh Boyer wrote:
On Fri, Nov 16, 2012 at 3:06 PM, Mel Gorman <mgorman@xxxxxxx> wrote:
On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote:
On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman <mgorman@xxxxxxx> wrote:
With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
based on failures" reverted, Zdenek Kabelac reported the following

Hmm, so it's just took longer to hit the problem and observe
kswapd0 spinning on my CPU again - it's not as endless like before -
but still it easily eats minutes - it helps to turn off Firefox
or TB (memory hungry apps) so kswapd0 stops soon - and restart
those apps again. (And I still have like >1GB of cached memory)

kswapd0 R running task 0 30 2 0x00000000
ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246
ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8
ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000
Call Trace:
[<ffffffff81555bf2>] preempt_schedule+0x42/0x60
[<ffffffff81557a95>] _raw_spin_unlock+0x55/0x60
[<ffffffff81192971>] put_super+0x31/0x40
[<ffffffff81192a42>] drop_super+0x22/0x30
[<ffffffff81193b89>] prune_super+0x149/0x1b0
[<ffffffff81141e2a>] shrink_slab+0xba/0x510

The sysrq+m indicates the system has no swap so it'll never reclaim
anonymous pages as part of reclaim/compaction. That is one part of the
problem but not the root cause as file-backed pages could also be reclaimed.

The likely underlying problem is that kswapd is woken up or kept awake
for each THP allocation request in the page allocator slow path.

If compaction fails for the requesting process then compaction will be
deferred for a time and direct reclaim is avoided. However, if there
are a storm of THP requests that are simply rejected, it will still
be the the case that kswapd is awake for a prolonged period of time
as pgdat->kswapd_max_order is updated each time. This is noticed by
the main kswapd() loop and it will not call kswapd_try_to_sleep().
Instead it will loopp, shrinking a small number of pages and calling
shrink_slab() on each iteration.

The temptation is to supply a patch that checks if kswapd was woken for
THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not
backed up by proper testing. As 3.7 is very close to release and this is
not a bug we should release with, a safer path is to revert "mm: remove
__GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the
balance_pgdat() logic in general.

Signed-off-by: Mel Gorman <mgorman@xxxxxxx>

Does anyone know if this is queued to go into 3.7 somewhere? I looked
a bit and can't find it in a tree. We have a few reports of Fedora
rawhide users hitting this.

No, because I was waiting to hear if a) it worked and preferably if the
alternative "less safe" option worked. This close to release it might be
better to just go with the safe option.

We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988
and people say this revert patch doesn't seem to make the issue go away
fully. Thorsten has created another kernel with the other patch applied
for testing.

At least I think that is the latest status from the bug. Hopefully the
commenters will chime in.

The short story from my current point of view is:

* my main machine at home where I initially saw the issue that started this thread seems to be running fine with rc6 and the "safe" patch Mel posted in https://lkml.org/lkml/2012/11/12/113 Before that I ran a rc5 kernel with the revert that went into rc6 and the "safe" patch -- that worked fine for a few days, too.

* I have a second machine where I started to use 3.7-rc kernels only yesterday (the machine triggered a bug in the radeon driver that seems to be fixed in rc6) which showed symptoms like the ones Zdenek Kabelac mentions in this thread. I wasn't able to look closer at it, but simply tried rc6 with the safe patch, which didn't help. I'm now running rc6 with the "riskier" patch from https://lkml.org/lkml/2012/11/12/151
I can't yet tell if it helps. If the problems shows up again I'll try to capture more debugging data via sysrq -- there wasn't any time for that when I was running rc6 with the safe patch, sorry.

Thorsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/