Re: [PATCHv14 5/9] efi: Add unaccepted memory support

From: Mel Gorman
Date: Wed Jul 12 2023 - 05:19:07 EST


On Tue, Jul 04, 2023 at 05:37:40PM +0300, Kirill A. Shutemov wrote:
> On Mon, Jul 03, 2023 at 02:25:18PM +0100, Mel Gorman wrote:
> > On Tue, Jun 06, 2023 at 05:26:33PM +0300, Kirill A. Shutemov wrote:
> > > efi_config_parse_tables() reserves memory that holds unaccepted memory
> > > configuration table so it won't be reused by page allocator.
> > >
> > > Core-mm requires few helpers to support unaccepted memory:
> > >
> > > - accept_memory() checks the range of addresses against the bitmap and
> > > accept memory if needed.
> > >
> > > - range_contains_unaccepted_memory() checks if anything within the
> > > range requires acceptance.
> > >
> > > Architectural code has to provide efi_get_unaccepted_table() that
> > > returns pointer to the unaccepted memory configuration table.
> > >
> > > arch_accept_memory() handles arch-specific part of memory acceptance.
> > >
> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> > > Reviewed-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
> > > Reviewed-by: Tom Lendacky <thomas.lendacky@xxxxxxx>
> >
> > By and large, this looks ok from the page allocator perspective as the
> > checks for unaccepted are mostly after watermark checks. However, if you
> > look in the initial fast path, you'll see this
> >
> > /*
> > * Forbid the first pass from falling back to types that fragment
> > * memory until all local zones are considered.
> > */
> > alloc_flags |= alloc_flags_nofragment(ac.preferred_zoneref->zone, gfp);
> >
> > While checking watermarks should be fine from a functional perspective and
> > the fast paths are unaffected, there is a risk of premature fragmentation
> > until all memory has been accepted. Meeting watermarks does not necessarily
> > mean that fragmentation is avoided as pageblocks can get mixed while still
> > meeting watermarks.
>
> Could you elaborate on this scenario?
>
> Current code checks the watermark, if it is met, try rmqueue().
>
> If rmqueue() fails anyway, try to accept more pages and retry the zone if
> it is successful.
>
> I'm not sure how we can get to the 'if (no_fallback) {' case with any
> unaccepted memory in the allowed zones.
>

Lets take an extreme example and assume that the low watermark is lower
than 2MB (one pageblock). Just before the watermark is reached (free
count between 1MB and 2MB), it is unlikely that all free pages are within
pageblocks of the same migratetype (e.g. MIGRATE_MOVABLE). If there is an
allocation near the watermark of a different type (e.g. MIGRATE_UNMOVABLE)
then the page allocation could fallback to a different pageblock and now
it is mixed. It's a condition that is only obvious if you are explicitly
checking for it via tracepoints. This can happen in the normal case, but
unaccepted memory makes it worse because the "pageblock mixing" could have
been avoided if the "no_fallback" case accepted at least one new pageblock
instead of mixing pageblocks.

That is an extreme example but the same logic applies when the free
count is at or near MIGRATE_TYPES*pageblock_nr_pages as it is not
guaranteed that the pageblocks with free pages are a migratetype that
matches the allocation request.

Hence, it may be more robust from a fragmentation perspective if
ALLOC_NOFRAGMENT requests accept memory if it is available and retries
before clearing ALLOC_NOFRAGMENT and mixing pageblocks before the watermarks
are reached.

> I see that there's preferred_zoneref and spread_dirty_pages cases, but
> unaccepted memory seems change nothing for them.
>

preferred_zoneref is about premature zone exhaustion and
spread_dirty_pages is about avoiding premature stalls on a node/zone due
to an imbalance in the number of pages waiting for writeback to
complete. There is an arguement to be made that they also should accept
memory but it's less clear how much of a problem this is. Both are very
obvious when they "fail" and likely are covered by the existing
watermark checks. Premature pageblock mixing is more subtle as the final
impact (root cause of a premature THP allocation failure) is harder to
detect.

--
Mel Gorman
SUSE Labs