Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed byreclaim/compaction based on failures"

From: Johannes Hirte
Date: Wed Nov 14 2012 - 16:44:09 EST


Am Fri, 9 Nov 2012 08:36:37 +0000
schrieb Mel Gorman <mgorman@xxxxxxx>:

> On Tue, Nov 06, 2012 at 11:15:54AM +0100, Johannes Hirte wrote:
> > Am Mon, 5 Nov 2012 14:24:49 +0000
> > schrieb Mel Gorman <mgorman@xxxxxxx>:
> >
> > > Jiri Slaby reported the following:
> > >
> > > (It's an effective revert of "mm: vmscan: scale number of
> > > pages reclaimed by reclaim/compaction based on failures".) Given
> > > kswapd had hours of runtime in ps/top output yesterday in the
> > > morning and after the revert it's now 2 minutes in sum for the
> > > last 24h, I would say, it's gone.
> > >
> > > The intention of the patch in question was to compensate for the
> > > loss of lumpy reclaim. Part of the reason lumpy reclaim worked is
> > > because it aggressively reclaimed pages and this patch was meant
> > > to be a sane compromise.
> > >
> > > When compaction fails, it gets deferred and both compaction and
> > > reclaim/compaction is deferred avoid excessive reclaim. However,
> > > since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is
> > > woken up each time and continues reclaiming which was not taken
> > > into account when the patch was developed.
> > >
> > > Attempts to address the problem ended up just changing the shape
> > > of the problem instead of fixing it. The release window gets
> > > closer and while a THP allocation failing is not a major problem,
> > > kswapd chewing up a lot of CPU is. This patch reverts "mm:
> > > vmscan: scale number of pages reclaimed by reclaim/compaction
> > > based on failures" and will be revisited in the future.
> > >
> > > Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
> > > ---
> > > mm/vmscan.c | 25 -------------------------
> > > 1 file changed, 25 deletions(-)
> > >
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 2624edc..e081ee8 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct
> > > scan_control *sc) return false;
> > > }
> > >
> > > -#ifdef CONFIG_COMPACTION
> > > -/*
> > > - * If compaction is deferred for sc->order then scale the number
> > > of pages
> > > - * reclaimed based on the number of consecutive allocation
> > > failures
> > > - */
> > > -static unsigned long scale_for_compaction(unsigned long
> > > pages_for_compaction,
> > > - struct lruvec *lruvec, struct
> > > scan_control *sc) -{
> > > - struct zone *zone = lruvec_zone(lruvec);
> > > -
> > > - if (zone->compact_order_failed <= sc->order)
> > > - pages_for_compaction <<=
> > > zone->compact_defer_shift;
> > > - return pages_for_compaction;
> > > -}
> > > -#else
> > > -static unsigned long scale_for_compaction(unsigned long
> > > pages_for_compaction,
> > > - struct lruvec *lruvec, struct
> > > scan_control *sc) -{
> > > - return pages_for_compaction;
> > > -}
> > > -#endif
> > > -
> > > /*
> > > * Reclaim/compaction is used for high-order allocation
> > > requests. It reclaims
> > > * order-0 pages before compacting the zone.
> > > should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static
> > > inline bool should_continue_reclaim(struct lruvec *lruvec,
> > > * inactive lists are large enough, continue reclaiming
> > > */
> > > pages_for_compaction = (2UL << sc->order);
> > > -
> > > - pages_for_compaction =
> > > scale_for_compaction(pages_for_compaction,
> > > - lruvec, sc);
> > > inactive_lru_pages = get_lru_size(lruvec,
> > > LRU_INACTIVE_FILE); if (nr_swap_pages > 0)
> > > inactive_lru_pages += get_lru_size(lruvec,
> > > LRU_INACTIVE_ANON); --
> >
> > Even with this patch I see kswapd0 very often on top. Much more than
> > with kernel 3.6.
>
> How severe is the CPU usage? The higher usage can be explained by "mm:
> remove __GFP_NO_KSWAPD" which allows kswapd to compact memory to
> reduce the amount of time processes spend in compaction but will
> result in the CPU cost being incurred by kswapd.
>
> Is it really high like the bug was reporting with high usage over long
> periods of time or do you just see it using 2-6% of CPU for short
> periods?

It is really high. I've seen with compile-jobs (make -j4 on dual
core) kswapd0 consuming at least 50% CPU most time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/