Re: [PATCH] mm: Introduce kernelcore=reliable option

From: Xishi Qiu
Date: Thu Oct 22 2015 - 23:40:32 EST


On 2015/10/15 21:32, Taku Izumi wrote:

> Xeon E7 v3 based systems supports Address Range Mirroring
> and UEFI BIOS complied with UEFI spec 2.5 can notify which
> ranges are reliable (mirrored) via EFI memory map.
> Now Linux kernel utilize its information and allocates
> boot time memory from reliable region.
>
> My requirement is:
> - allocate kernel memory from reliable region
> - allocate user memory from non-reliable region
>
> In order to meet my requirement, ZONE_MOVABLE is useful.
> By arranging non-reliable range into ZONE_MOVABLE,
> reliable memory is only used for kernel allocations.
>
> This patch extends existing "kernelcore" option and
> introduces kernelcore=reliable option. By specifying
> "reliable" instead of specifying the amount of memory,
> non-reliable region will be arranged into ZONE_MOVABLE.
>
> Earlier discussion is at:
> https://lkml.org/lkml/2015/10/9/24
>
> For example, suppose 2-nodes system with the following
> memory range:
> node 0 [mem 0x0000000000001000-0x000000109fffffff]
> node 1 [mem 0x00000010a0000000-0x000000209fffffff]
>
> and the following ranges are marked as reliable (*):
> [0x0000000000000000-0x0000000100000000]
> [0x0000000100000000-0x0000000180000000]
> [0x00000010a0000000-0x0000001120000000]
>
> If you specify kernelcore=reliable, Movable zones are
> arranged like the following:
> Movable zone start for each node
> Node 0: 0x0000000180000000
> Node 1: 0x0000001120000000
>
> (*) I specified the following instead of using UEFI BIOS
> complied with UEFI spec 2.5,
> efi_fake_mem=4G@0:0x10000,2G@0x10a0000000:0x10000,2G@4G:0x10000
> efi_fake_mem is found at:
> git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git
> tags/efi-next
>
> Signed-off-by: Taku Izumi <izumi.taku@xxxxxxxxxxxxxx>
> ---
> Documentation/kernel-parameters.txt | 9 ++++++++-
> mm/page_alloc.c | 26 ++++++++++++++++++++++++++
> 2 files changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index cd5312f..b2c8c13 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -1663,7 +1663,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>
> keepinitrd [HW,ARM]
>
> - kernelcore=nn[KMG] [KNL,X86,IA-64,PPC] This parameter
> + kernelcore= Format: nn[KMG] | "reliable"
> + [KNL,X86,IA-64,PPC] This parameter
> specifies the amount of memory usable by the kernel
> for non-movable allocations. The requested amount is
> spread evenly throughout all nodes in the system. The
> @@ -1679,6 +1680,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> use the HighMem zone if it exists, and the Normal
> zone if it does not.
>
> + Instead of specifying the amount of memory (nn[KMS]),
> + you can specify "reliable" option. In case "reliable"
> + option is specified, reliable memory is used for
> + non-movable allocations and remaining memory is used
> + for Movable pages.
> +
> kgdbdbgp= [KGDB,HW] kgdb over EHCI usb debug port.
> Format: <Controller#>[,poll interval]
> The controller # is the number of the ehci usb debug
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index beda417..d0b3ac9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -221,6 +221,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
> static unsigned long __initdata required_kernelcore;
> static unsigned long __initdata required_movablecore;
> static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
> +static bool reliable_kernelcore __initdata;
>
> /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
> int movable_zone;
> @@ -5618,6 +5619,25 @@ static void __init find_zone_movable_pfns_for_nodes(void)
> }
>
> /*
> + * If kernelcore=reliable is specified, ignore movablecore option
> + */
> + if (reliable_kernelcore) {
> + for_each_memblock(memory, r) {
> + if (memblock_is_mirror(r))
> + continue;
> +
> + nid = r->nid;
> +
> + usable_startpfn = PFN_DOWN(r->base);
> + zone_movable_pfn[nid] = zone_movable_pfn[nid] ?
> + min(usable_startpfn, zone_movable_pfn[nid]) :
> + usable_startpfn;
> + }
> +
> + goto out2;

Hi Taku,

If user set 0-1G is mirrored memory, 1-2G is normal memory, and 2-4G is hole.
Then the movable zone will start at 2G?

Thanks,
Xishi Qiu

> + }
> +
> + /*
> * If movablecore=nn[KMG] was specified, calculate what size of
> * kernelcore that corresponds so that memory usable for
> * any allocation type is evenly spread. If both kernelcore
> @@ -5873,6 +5893,12 @@ static int __init cmdline_parse_core(char *p, unsigned long *core)
> */
> static int __init cmdline_parse_kernelcore(char *p)
> {
> + /* parse kernelcore=reliable */
> + if (parse_option_str(p, "reliable")) {
> + reliable_kernelcore = true;
> + return 0;
> + }
> +
> return cmdline_parse_core(p, &required_kernelcore);
> }
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/