Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory

From: Baoquan He
Date: Thu Jan 04 2018 - 05:31:12 EST


On 01/04/18 at 04:02pm, Chao Fan wrote:
> In current code, kaslr may choose the memory region in movable
> nodes to extract kernel, which will make the nodes can't be hot-removed.
> To solve it, we can specify the memory region in immovable node.
> Create immovable_mem to store the regions in immovable_mem, where should
> be chosen by kaslr.
>
> Also change the "handle_mem_memmap" to "handle_mem_filter", since
> it will not only handle memmap parameter now.
> Since "immovable_mem=" only works with "movable_node", so "immovable_mem="
> doesn't work alone. If specify "movable_node" without "immovable_mem=",
> disable KASLR.
>
> Multiple regions can be specified, comma delimited.
> Considering the usage of memory, only support for 4 regions.
> 4 regions contains 2 nodes at least, enough for kernel to extract.
>
> Signed-off-by: Chao Fan <fanc.fnst@xxxxxxxxxxxxxx>

Hi Chao,

Thanks for your effort on this issue.

Luiz told me they met a hugetlb issue when kaslr enabled on kvm guest.
Please check the below bug information. There's only one available
position which hugepage can use to allocate. In this case, if we have a
generic parameter to tell kernel where we can randomize into, this
hugepage issue can be solved. We can restrict kernel to randomize beyond
[0x40000000, 0x7fffffff]. Not sure if your immovable_mem=nn[KMG]@ss[KMG]
can be adjusted to do this. I am hesitating on whether we should change
this or not.

Hi maintainers, Kees,

1) Let's keep Chao's current code, just ask Luiz to use nokaslr to work
around the hugepage allocation failure;
2) Change immovable_mem=nn[KMG]@ss[KMG] to be a generic parameter,
people can use it to restrict kernel to places they want.

Which one is better? Or any other idea or suggestion?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Luiz said on kvm guest, they add the following to the kernel command-line:
default_hugepagesz=1G hugepagesz=1G hugepages=1

Boot the guest and check number of 1GB pages reserved:

grep HugePages_Total /proc/meminfo

When booting with "nokaslr" HugePages_Total is always 1. When booting
without "nokaslr" sometimes HugePages_Total is zero (that is, reserving
the 1GB page fails).

And 20 reboots, 6 failures to mount single 1G hugepage.

I reproduced on kvm guest with Luiz's help, and found it's because there's
only one available position for 1G hugepage allocation, [0x40000000, 0x7fffffff].
That's why they saw 1/4 possibility of failure. If kernel randomized to
[0x0, 0x3fffffff], [0x80000000, 0xbffdffff], or [0x100000000,
0x13fffffff], hugepage can always succeed to allocate.

dmesg output snippet of kvm guest:
[ +0.000000] e820: BIOS-provided physical RAM map:
[ +0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ +0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
[ +0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable

Thanks
Baoquan

> ---
> arch/x86/boot/compressed/kaslr.c | 112 +++++++++++++++++++++++++++++++++++++--
> 1 file changed, 109 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index 8199a6187251..60e5aa28b510 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -108,6 +108,19 @@ enum mem_avoid_index {
>
> static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +/* Only supporting at most 4 immovable memory regions with kaslr */
> +#define MAX_IMMOVABLE_MEM 4
> +
> +static bool lack_immovable_mem;
> +
> +/* Store the memory regions in immovable node */
> +static struct mem_vector immovable_mem[MAX_IMMOVABLE_MEM];
> +
> +/* The immovable regions user specify, not more than 4 */
> +static int num_immovable_region;
> +#endif
> +
> static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
> {
> /* Item one is entirely before item two. */
> @@ -206,15 +219,97 @@ static void mem_avoid_memmap(char *str)
> memmap_too_large = true;
> }
>
> -static int handle_mem_memmap(void)
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +static int parse_immovable_mem(char *p,
> + unsigned long long *start,
> + unsigned long long *size)
> +{
> + char *oldp;
> +
> + if (!p)
> + return -EINVAL;
> +
> + oldp = p;
> + *size = memparse(p, &p);
> + if (p == oldp)
> + return -EINVAL;
> +
> + switch (*p) {
> + case '@':
> + *start = memparse(p + 1, &p);
> + return 0;
> + default:
> + /*
> + * If w/o offset, only size specified, immovable_mem=nn[KMG]
> + * has the same behaviour as immovable_mem=nn[KMG]@0. It means
> + * the region starts from 0.
> + */
> + *start = 0;
> + return 0;
> + }
> +
> + return -EINVAL;
> +}
> +
> +static void parse_immovable_mem_regions(char *str)
> +{
> + static int i;
> +
> + while (str && (i < MAX_IMMOVABLE_MEM)) {
> + int rc;
> + unsigned long long start, size;
> + char *k = strchr(str, ',');
> +
> + if (k)
> + *k++ = 0;
> +
> + rc = parse_immovable_mem(str, &start, &size);
> + if (rc < 0)
> + break;
> + str = k;
> +
> + immovable_mem[i].start = start;
> + immovable_mem[i].size = size;
> + i++;
> + }
> + num_immovable_region = i;
> +}
> +#else
> +static inline void parse_immovable_mem_regions(char *str)
> +{
> +}
> +#endif
> +
> +static int handle_mem_filter(void)
> {
> char *args = (char *)get_cmd_line_ptr();
> size_t len = strlen((char *)args);
> + bool enable_movable_node = false;
> char *tmp_cmdline;
> char *param, *val;
> u64 mem_size;
>
> - if (!strstr(args, "memmap=") && !strstr(args, "mem="))
> +#ifdef CONFIG_MEMORY_HOTPLUG
> + if (strstr(args, "movable_node")) {
> + /*
> + * Confirm "movable_node" specified, otherwise
> + * "immovable_mem=" doesn't work.
> + */
> + enable_movable_node = true;
> +
> + /*
> + * If only specify "movable_node" without "immovable_mem=",
> + * disable KASLR.
> + */
> + if (!strstr(args, "immovable_mem=")) {
> + lack_immovable_mem = true;
> + return 0;
> + }
> + }
> +#endif
> +
> + if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
> + !enable_movable_node)
> return 0;
>
> tmp_cmdline = malloc(len + 1);
> @@ -239,6 +334,9 @@ static int handle_mem_memmap(void)
>
> if (!strcmp(param, "memmap")) {
> mem_avoid_memmap(val);
> + } else if (!strcmp(param, "immovable_mem=") &&
> + enable_movable_node) {
> + parse_immovable_mem_regions(val);
> } else if (!strcmp(param, "mem")) {
> char *p = val;
>
> @@ -378,7 +476,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
> /* We don't need to set a mapping for setup_data. */
>
> /* Mark the memmap regions we need to avoid */
> - handle_mem_memmap();
> + handle_mem_filter();
>
> #ifdef CONFIG_X86_VERBOSE_BOOTUP
> /* Make sure video RAM can be used. */
> @@ -673,6 +771,14 @@ static unsigned long find_random_phys_addr(unsigned long minimum,
> return 0;
> }
>
> +#ifdef CONFIG_MEMORY_HOTPLUG
> + /* Check if specify "movable_node" without "immovable_mem=". */
> + if (lack_immovable_mem) {
> + debug_putstr("Fail KASLR when movable_node specified without immovable_mem=.\n");
> + return 0;
> + }
> +#endif
> +
> /* Make sure minimum is aligned. */
> minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
>
> --
> 2.14.3
>
>
>