Re: [PATCH 02/10] mm, compaction: report compaction as contended only due to lock contention

From: Vlastimil Babka
Date: Wed Jun 11 2014 - 08:22:38 EST


On 06/11/2014 03:10 AM, Minchan Kim wrote:
On Mon, Jun 09, 2014 at 11:26:14AM +0200, Vlastimil Babka wrote:
Async compaction aborts when it detects zone lock contention or need_resched()
is true. David Rientjes has reported that in practice, most direct async
compactions for THP allocation abort due to need_resched(). This means that a
second direct compaction is never attempted, which might be OK for a page
fault, but hugepaged is intended to attempt a sync compaction in such case and
in these cases it won't.

This patch replaces "bool contended" in compact_control with an enum that
distinguieshes between aborting due to need_resched() and aborting due to lock
contention. This allows propagating the abort through all compaction functions
as before, but declaring the direct compaction as contended only when lock
contantion has been detected.

As a result, hugepaged will proceed with second sync compaction as intended,
when the preceding async compaction aborted due to need_resched().

You said "second direct compaction is never attempted, which might be OK
for a page fault" and said "hugepagd is intented to attempt a sync compaction"
so I feel you want to handle khugepaged so special unlike other direct compact
(ex, page fault).

Well khugepaged is my primary concern, but I imagine there are other direct compaction users besides THP page fault and khugepaged.

By this patch, direct compaction take care only lock contention, not rescheduling
so that pop questions.

Is it okay not to consider need_resched in direct compaction really?

It still considers need_resched() to back of from async compaction. It's only about signaling contended_compaction back to __alloc_pages_slowpath(). There's this code executed after the first, async compaction fails:

/*
* It can become very expensive to allocate transparent hugepages at
* fault, so use asynchronous memory compaction for THP unless it is
* khugepaged trying to collapse.
*/
if (!(gfp_mask & __GFP_NO_KSWAPD) || (current->flags & PF_KTHREAD))
migration_mode = MIGRATE_SYNC_LIGHT;

/*
* If compaction is deferred for high-order allocations, it is because
* sync compaction recently failed. In this is the case and the caller
* requested a movable allocation that does not heavily disrupt the
* system then fail the allocation instead of entering direct reclaim.
*/
if ((deferred_compaction || contended_compaction) &&
(gfp_mask & __GFP_NO_KSWAPD))
goto nopage;

Both THP page fault and khugepaged use __GFP_NO_KSWAPD. The first if() decides whether the second attempt will be sync (for khugepaged) or async (page fault). The second if() decides that if compaction was contended, then there won't be any second attempt (and reclaim) at all. Counting need_resched() as contended in this case is bad for khugepaged. Even for page fault it means no direct reclaim and a second async compaction. David says need_resched() occurs so often then it is a poor heuristic to decide this.

We have taken care of it in direct reclaim path so why direct compaction is
so special?

I admit I'm not that familiar with reclaim but I didn't quickly find any need_resched() there? There's plenty of cond_resched() but that doesn't mean it will abort? Could you explain for me?

Why does khugepaged give up easily if lock contention/need_resched happens?
khugepaged is important for success ratio as I read your description so IMO,
khugepaged should do synchronously without considering early bail out by
lock/rescheduling.

Well a stupid answer is that's how __alloc_pages_slowpath() works :) I don't think it's bad to try using first a more lightweight approach before trying the heavyweight one. As long as the heavyweight one is not skipped for khugepaged.

If it causes problems, user should increase scan_sleep_millisecs/alloc_sleep_millisecs,
which is exactly the knob for that cases.

So, my point is how about making khugepaged doing always dumb synchronous
compaction thorough PG_KHUGEPAGED or GFP_SYNC_TRANSHUGE?


Reported-by: David Rientjes <rientjes@xxxxxxxxxx>
Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
Cc: Michal Nazarewicz <mina86@xxxxxxxxxx>
Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
---
mm/compaction.c | 20 ++++++++++++++------
mm/internal.h | 15 +++++++++++----
2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b73b182..d37f4a8 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -185,9 +185,14 @@ static void update_pageblock_skip(struct compact_control *cc,
}
#endif /* CONFIG_COMPACTION */

-static inline bool should_release_lock(spinlock_t *lock)
+enum compact_contended should_release_lock(spinlock_t *lock)
{
- return need_resched() || spin_is_contended(lock);
+ if (need_resched())
+ return COMPACT_CONTENDED_SCHED;
+ else if (spin_is_contended(lock))
+ return COMPACT_CONTENDED_LOCK;
+ else
+ return COMPACT_CONTENDED_NONE;
}

/*
@@ -202,7 +207,9 @@ static inline bool should_release_lock(spinlock_t *lock)
static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
bool locked, struct compact_control *cc)
{
- if (should_release_lock(lock)) {
+ enum compact_contended contended = should_release_lock(lock);
+
+ if (contended) {
if (locked) {
spin_unlock_irqrestore(lock, *flags);
locked = false;
@@ -210,7 +217,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,

/* async aborts if taking too long or contended */
if (cc->mode == MIGRATE_ASYNC) {
- cc->contended = true;
+ cc->contended = contended;
return false;
}

@@ -236,7 +243,7 @@ static inline bool compact_should_abort(struct compact_control *cc)
/* async compaction aborts if contended */
if (need_resched()) {
if (cc->mode == MIGRATE_ASYNC) {
- cc->contended = true;
+ cc->contended = COMPACT_CONTENDED_SCHED;
return true;
}

@@ -1095,7 +1102,8 @@ static unsigned long compact_zone_order(struct zone *zone, int order,
VM_BUG_ON(!list_empty(&cc.freepages));
VM_BUG_ON(!list_empty(&cc.migratepages));

- *contended = cc.contended;
+ /* We only signal lock contention back to the allocator */
+ *contended = cc.contended == COMPACT_CONTENDED_LOCK;
return ret;
}

diff --git a/mm/internal.h b/mm/internal.h
index 7f22a11f..4659e8e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -117,6 +117,13 @@ extern int user_min_free_kbytes;

#if defined CONFIG_COMPACTION || defined CONFIG_CMA

+/* Used to signal whether compaction detected need_sched() or lock contention */
+enum compact_contended {
+ COMPACT_CONTENDED_NONE = 0, /* no contention detected */
+ COMPACT_CONTENDED_SCHED, /* need_sched() was true */
+ COMPACT_CONTENDED_LOCK, /* zone lock or lru_lock was contended */
+};
+
/*
* in mm/compaction.c
*/
@@ -144,10 +151,10 @@ struct compact_control {
int order; /* order a direct compactor needs */
int migratetype; /* MOVABLE, RECLAIMABLE etc */
struct zone *zone;
- bool contended; /* True if a lock was contended, or
- * need_resched() true during async
- * compaction
- */
+ enum compact_contended contended; /* Signal need_sched() or lock
+ * contention detected during
+ * compaction
+ */
};

unsigned long
--
1.8.4.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/