Re: kswapd craziness in 3.7

From: Thorsten Leemhuis
Date: Wed Nov 28 2012 - 05:51:15 EST


Mel Gorman wrote on 28.11.2012 11:13:
> On Tue, Nov 27, 2012 at 03:19:38PM -0800, Linus Torvalds wrote:
>> On Tue, Nov 27, 2012 at 2:26 PM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>> > On Tue, Nov 27, 2012 at 05:02:36PM -0500, Rik van Riel wrote:
>
>> And the one who comes out gets to explain to me which patch(es) I
>> should apply, and which I should revert, if any.
>
> Based on the reports I've seen I expect the following to work for 3.7
>
> Keep
> 96710098 mm: revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"
> ef6c5be6 fix incorrect NR_FREE_PAGES accounting (appears like memory leak)
>
> Revert
> 82b212f4 Revert "mm: remove __GFP_NO_KSWAPD"
>
> Merge
> mm: vmscan: fix kswapd endless loop on higher order allocation
> mm: Avoid waking kswapd for THP allocations when compaction is deferred or contended

I'll build a kernel with this combination and will give it a try. Maybe
one of those people that reported problems in
https://bugzilla.redhat.com/show_bug.cgi?id=866988 can try them, too.
There two people recently reported their problems were gone with kernels
that contained 82b212f4.

> Johannes' patch should remove the necessity for __GFP_NO_KSWAPD revert but I
> think we should also avoid waking kswapd for THP allocations if compaction
> is deferred. Johannes' patch might mean that kswapd goes quickly go back
> to sleep but it's still busy work.

Is there a way to trigger (some benchmark?) and detect (something in
/proc/vmstat ?) the problem Hannes patch tries to fix?

Background: The two main problems that got me into this discussion
vanished thx to 9671009 (mm: revert "mm: vmscan: scale number of pages
reclaimed by reclaim/compaction based on failures") and ef6c5be (fix
incorrect NR_FREE_PAGES accounting (appears like memory leak)). I
thought all my problems had gone, but after a few days of uptime
(suspended and resumed the particular machine a few times in between, as
I was using it just in the evenings) kswap now and then started
consuming nearly 100% of one cpu core for 10 to 15 seconds intervals (it
seems watching a YouTube video triggered it; and the machine was using a
little bit swap space). I just had started debugging this, but due to
some stupid mistake
(https://plus.google.com/107616711159256259828/posts/GXuhf1LTien ) then
rebooted the machine :-/ So maybe I hit the problem Hannes patch tries
to solve, but I'm not sure; and I have no easy way to verify quickly if
the proposed patch combination helps.

Thorsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/