Re: collision between ZONE_MOVABLE and memblock allocations

From: Michal Hocko
Date: Wed Jul 19 2023 - 04:06:10 EST


On Wed 19-07-23 10:59:52, Mike Rapoport wrote:
> On Wed, Jul 19, 2023 at 08:14:48AM +0200, Michal Hocko wrote:
> > On Tue 18-07-23 16:01:06, Ross Zwisler wrote:
> > [...]
> > > I do think that we need to fix this collision between ZONE_MOVABLE and memmap
> > > allocations, because this issue essentially makes the movablecore= kernel
> > > command line parameter useless in many cases, as the ZONE_MOVABLE region it
> > > creates will often actually be unmovable.
> >
> > movablecore is kinda hack and I would be more inclined to get rid of it
> > rather than build more into it. Could you be more specific about your
> > use case?
> >
> > > Here are the options I currently see for resolution:
> > >
> > > 1. Change the way ZONE_MOVABLE memory is allocated so that it is allocated from
> > > the beginning of the NUMA node instead of the end. This should fix my use case,
> > > but again is prone to breakage in other configurations (# of NUMA nodes, other
> > > architectures) where ZONE_MOVABLE and memblock allocations might overlap. I
> > > think that this should be relatively straightforward and low risk, though.
> > >
> > > 2. Make the code which processes the movablecore= command line option aware of
> > > the memblock allocations, and have it choose a region for ZONE_MOVABLE which
> > > does not have these allocations. This might be done by checking for
> > > PageReserved() as we do with offlining memory, though that will take some boot
> > > time reordering, or we'll have to figure out the overlap in another way. This
> > > may also result in us having two ZONE_NORMAL zones for a given NUMA node, with
> > > a ZONE_MOVABLE section in between them. I'm not sure if this is allowed?
> >
> > Yes, this is no problem. Zones are allowed to be sparse.
>
> The current initialization order is roughly
>
> * very early initialization with some memblock allocations
> * determine zone locations and sizes
> * initialize memory map
> - memblock_alloc(lots of memory)
> * lots of unrelated initializations that may allocate memory
> * release free pages from memblock to the buddy allocator
>
> With 2) we can make sure the memory map and early allocations won't be in
> the ZONE_MOVABLE, but we'll still may have reserved pages there.

Yes this will always be fragile. If the spefic placement of the movable
memory is not important and the only thing that matters is the size and
numa locality then an easier to maintain solution would be to simply
offline enough memory blocks very early in the userspace bring up and
online it back as movable. If offlining fails just try another
memblock. This doesn't require any kernel code change.
--
Michal Hocko
SUSE Labs