Re: [RFC PATCH 03/26] mm: make pageblock_order 2M per default

From: Johannes Weiner
Date: Tue Apr 18 2023 - 22:55:59 EST


On Wed, Apr 19, 2023 at 03:01:05AM +0300, Kirill A. Shutemov wrote:
> On Tue, Apr 18, 2023 at 03:12:50PM -0400, Johannes Weiner wrote:
> > pageblock_order can be of various sizes, depending on configuration,
> > but the default is MAX_ORDER-1.
>
> Note that MAX_ORDER got redefined in -mm tree recently.
>
> > Given 4k pages, that comes out to
> > 4M. This is a large chunk for the allocator/reclaim/compaction to try
> > to keep grouped per migratetype. It's also unnecessary as the majority
> > of higher order allocations - THP and slab - are smaller than that.
>
> This seems way to x86-specific.

Hey, that's the machines I have access to ;)

> Other arches have larger THP sizes. I believe 16M is common.
>
> Maybe define it as min(MAX_ORDER, PMD_ORDER)?

Hm, let me play around with larger pageblocks.

The thing that gives me pause is that this seems quite aggressive as a
default block size for the allocator and reclaim/compaction - if you
consider the implications for internal fragmentation and the amount of
ongoing defragmentation work it would require.

IOW, it's not just a function of physical page size supported by the
CPU. It's also a function of overall memory capacity. Independent of
architecture, 2MB seems like a more reasonable step up than 16M.

16M is great for TLB coverage, and in our DCs we're getting a lot of
use out of 1G hugetlb pages as well. The question is if those archs
are willing to pay the cost of serving such page sizes quickly and
reliably during runtime; or if that's something better left to setups
with explicit preallocations and stuff like hugetlb_cma reservations.