Re: [PATCHv3 1/2] mm/memblock: extend the limit inferior of bottom-up after parsing hotplug attr

From: Baoquan He
Date: Fri Jan 04 2019 - 22:45:05 EST


On 01/04/19 at 05:09pm, Mike Rapoport wrote:
> On Thu, Jan 03, 2019 at 10:47:06AM -0800, Tejun Heo wrote:
> > Hello,
> >
> > On Wed, Jan 02, 2019 at 07:05:38PM +0200, Mike Rapoport wrote:
> > > I agree that currently the bottom-up allocation after the kernel text has
> > > issues with KASLR. But this issues are not necessarily related to the
> > > memory hotplug. Even with a single memory node, a bottom-up allocation will
> > > fail if KASLR would put the kernel near the end of node0.
> > >
> > > What I am trying to understand is whether there is a fundamental reason to
> > > prevent allocations from [0, kernel_start)?
> > >
> > > Maybe Tejun can recall why he suggested to start bottom-up allocations from
> > > kernel_end.
> >
> > That's from 79442ed189ac ("mm/memblock.c: introduce bottom-up
> > allocation mode"). I wasn't involved in that patch, so no idea why
> > the restrictions were added, but FWIW it doesn't seem necessary to me.
>
> I should have added the reference [1] at the first place :)
> Thanks!
>
> [1] https://lore.kernel.org/lkml/20130904192215.GG26609@xxxxxxxxxxxxxx/

With my understanding, we may not be able to discard the bottom-up
method for the current kernel. It's related to hotplug feature when
'movable_node' kernel parameter is specified. With 'movable_node',
system relies on reading hotplug information from firmware, on x86 it's
acpi SRAT table. In the current system, we allocate memblock region
top-down by default. However, before that hotplug information retrieving,
there are several places of memblock allocating, top-down memblock
allocation must break hotplug feature since it will allocate kernel data
in movable zone which is usually at the end node on bare metal system.

This bottom-up way is taken on many ARCHes, it works well on system if
KASLR is not enabled. Below is the searching result in the current linux
kernel, we can see that all ARCHes have this mechanism, except of
arm/arm64. But now only arm64/mips/x86 have KASLR.

W/o KASLR, allocating memblock region above kernle end when hotplug info
is not parsed, looks very reasonable. Since kernel is usually put at
lower address, e.g on x86, it's 16M. My thought is that we need do
memblock allocation around kernel before hotplug info parsed. That is
for system w/o KASLR, we will keep the current bottom-up way; for system
with KASLR, we should allocate memblock region top-down just below
kernel start.

This issue must break hotplug, just because currently bare metal system
need add 'nokaslr' to disable KASLR since another bug fix is under
discussion as below, so this issue is covered up.

[PATCH v14 0/5] x86/boot/KASLR: Parse ACPI table and limit KASLR to choosing immovable memory
lkml.kernel.org/r/20181214093013.13370-1-fanc.fnst@xxxxxxxxxxxxxx

[~ ]$ git grep memblock_set_bottom_up
arch/alpha/kernel/setup.c: memblock_set_bottom_up(true);
arch/m68k/mm/motorola.c: memblock_set_bottom_up(true);
arch/mips/kernel/setup.c: memblock_set_bottom_up(true);
arch/mips/kernel/traps.c: memblock_set_bottom_up(false);
arch/nds32/kernel/setup.c: memblock_set_bottom_up(true);
arch/powerpc/kernel/paca.c: memblock_set_bottom_up(true);
arch/powerpc/kernel/paca.c: memblock_set_bottom_up(false);
arch/s390/kernel/setup.c: memblock_set_bottom_up(true);
arch/s390/kernel/setup.c: memblock_set_bottom_up(false);
arch/sparc/mm/init_32.c: memblock_set_bottom_up(true);
arch/x86/kernel/setup.c: memblock_set_bottom_up(true);
arch/x86/mm/numa.c: memblock_set_bottom_up(false);
include/linux/memblock.h:static inline void __init memblock_set_bottom_up(bool enable)