Re: mmotm woes, mainly compaction

From: Vlastimil Babka
Date: Thu Apr 14 2016 - 19:25:20 EST


On 04/14/2016 10:15 PM, Hugh Dickins wrote:
... because the thing that was really wrong (and I spent far too long
studying compaction.c before noticing in page_alloc.c) was this:

/*
* It can become very expensive to allocate transparent hugepages at
* fault, so use asynchronous memory compaction for THP unless it is
* khugepaged trying to collapse.
*/
if (!is_thp_gfp_mask(gfp_mask) || (current->flags & PF_KTHREAD))
migration_mode = MIGRATE_SYNC_LIGHT;

Yes, but advancing migration_mode before should_compact_retry() checks
whether it was MIGRATE_ASYNC, so eliminating the retry when MIGRATE_ASYNC
compaction_failed(). And the bogus *migrate_mode++ code had appeared to
work by interposing an additional state for a retry.

Ouch. But AFAICS the is_thp_gfp_mask() path is already addressed properly in Michal's latest versions of the patches (patch 10/11 of "[PATCH 00/11] oom detection rework v5"). So that wasn't what you tested this time? Maybe mmotm still had the older version... But I guess Michal will know better than me.

So all I had to do to get OOM-free results, on both machines, was to
remove those lines quoted above. Now, no doubt it's wrong (for THP)
to remove them completely, but I don't want to second-guess you on
where to do the equivalent check: over to you for that.

I'm over-optimistic when I say OOM-free: on the G5 yes; but I did
see an order=2 OOM after an hour on the laptop one time, and much
sooner when I applied your further three patches (classzone_idx etc),
again on the laptop, on one occasion but not another. Something not
quite right, but much easier to live with than before, and will need
a separate tedious investigation if it persists.

Earlier on, when all my suspicions were in compaction.c, I did make a
couple of probable fixes there, though neither helped out of my OOMs:

At present MIGRATE_SYNC_LIGHT is allowing __isolate_lru_page() to
isolate a PageWriteback page, which __unmap_and_move() then rejects
with -EBUSY: of course the writeback might complete in between, but
that's not what we usually expect, so probably better not to isolate it.

And where compact_zone() sets whole_zone, I tried a BUG_ON if
compact_scanners_met() already, and hit that as a real possibility
(only when compactors competing perhaps): without the BUG_ON, it
would proceed to compact_finished() COMPACT_COMPLETE without doing
any work at all - and the new should_compact_retry() code is placing
more faith in that compact_result than perhaps it deserves. No need
to BUG_ON then, just be stricter about setting whole_zone.

I warned Michal about the whole_zone settings being prone to races, thanks for confirming it's real. Although, compact_scanners_met() being immediately true, when the migration scanner is at the zone start, sounds really weird. That would mean the free scanner did previously advance as far as first pageblock to cache that pfn, which I wouldn't imagine except maybe a tiny zone... was it a ZONE_DMA? Or it's a sign of a larger issue.

Anyway, too bad fixing this didn't help the rare OOMs as that would be a perfectly valid reason for them.

(I do worry about the skip hints: once in MIGRATE_SYNC_LIGHT mode,
compaction seemed good at finding pages to migrate, but not so good
at then finding enough free pages to migrate them into. Perhaps
there were none, but it left me suspicious.)

Skip hints could be another reason for the rare OOMs. What you describe sounds like the reclaim between async and sync_light compaction attempts did free some pages, but their pageblocks were not scanned due to the skip hints.
A test patch setting compact_control.ignore_skip_hint for sync_(light) compaction could confirm (or also mask, I'm afraid) the issue.

Vlastimil, thanks so much for picking up my bits and pieces a couple
of days ago: I think I'm going to impose upon you again with the below,
if that's not too irritating.

No problem :)

Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>

--- 4.6-rc2-mm1/mm/compaction.c 2016-04-11 11:35:08.536604712 -0700
+++ linux/mm/compaction.c 2016-04-13 23:17:03.671959715 -0700
@@ -1190,7 +1190,7 @@ static isolate_migrate_t isolate_migrate
struct page *page;
const isolate_mode_t isolate_mode =
(sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) |
- (cc->mode == MIGRATE_ASYNC ? ISOLATE_ASYNC_MIGRATE : 0);
+ (cc->mode != MIGRATE_SYNC ? ISOLATE_ASYNC_MIGRATE : 0);

/*
* Start at where we last stopped, or beginning of the zone as

I'll check this optimization, thanks.

@@ -1459,8 +1459,8 @@ static enum compact_result compact_zone(
zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
}

- if (cc->migrate_pfn == start_pfn)
- cc->whole_zone = true;
+ cc->whole_zone = cc->migrate_pfn == start_pfn &&
+ cc->free_pfn == pageblock_start_pfn(end_pfn - 1);

cc->last_migrated_pfn = 0;

This would be for Michal, but I agree.