Re: KASLR causes intermittent boot failures on some systems

From: Baoquan He
Date: Wed Apr 19 2017 - 10:55:45 EST


On 04/19/17 at 07:27am, Thomas Garnier wrote:
> On Wed, Apr 19, 2017 at 6:36 AM, Baoquan He <bhe@xxxxxxxxxx> wrote:
> > Hi all,
> >
> > I login in Jeff's system, and added debug code, no clue found. However
> > DaveY found he disabled page_offset randomization only and the efi issue
> > won't be seen on his system with kaslr enabled. I did it too on Jeff's
> > pmem system, it has the same result. I have rebooted several times, all
> > boot successfully. In the current code, no __PAGE_OFFSET_BASE is used
> > directly, don't know why it failed.
>
> Great! I still cannot repro it.
>
> >
> > Does anyone have any idea or hint I can try? I read pmem code about
> > the devm_nsio_enable/pmem_attach_disk/arch_add_memory, have no idea yet.
>
> I would test couple things:
> - Set page_offset_base to 0 by default and set it to
> __PAGE_OFFSET_BASE in kernel_randomize_memory (without randomizing
> it). If it crashes on a low address, it might be due to using __va or
> PAGE_OFFSET in general before randomization is done.

Thanks, Thomas!

Changed code like below, it should have the same effect as you suggested.

@@ -140,6 +140,8 @@ void __init kernel_randomize_memory(void)
* Select a random virtual address using the extra entropy
* available.
*/
+ if (i == 0)
+ continue;
entropy = remain_entropy / (ARRAY_SIZE(kaslr_regions) - i);

Didn't see failure since above change applied.

> - Does any change in __PAGE_OFFSET lead to a crash? Or only when
> __PAGE_OFFSET is on a specific range. Given that you may have to
> reboot multiple times to get a crash, I assume that a specific range
> is the problem but might be worth checking.

Good point, will check.