Re: [criu] 1M guard page ruined restore

From: Oleg Nesterov
Date: Tue Jun 20 2017 - 06:51:24 EST


On 06/20, Cyrill Gorcunov wrote:
>
> | diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> | index f0c8b33..520802d 100644
> | --- a/fs/proc/task_mmu.c
> | +++ b/fs/proc/task_mmu.c
> | @@ -300,11 +300,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
> |
> | /* We don't show the stack guard page in /proc/maps */
> | start = vma->vm_start;
> | - if (stack_guard_page_start(vma, start))
> | - start += PAGE_SIZE;
> | end = vma->vm_end;
> | - if (stack_guard_page_end(vma, end))
> | - end -= PAGE_SIZE;
> |
> | seq_setwidth(m, 25 + sizeof(void *) * 6 - 1);
> | seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu ",
>
> For which we of course are not ready because we've been implying the
> guard page is returned here so we adjust addresses locally when saving
> them into images.
>
> So now we need to figure out somehow if show_map_vma accounts [PAGE_SIZE|guard_area] or not,
> I guess we might use kernel version here but it won't be working fine on custom kernels,
> or kernels with the patch backported.

You can write a simple test. Just do mmap(MAP_GROWSDOWN) and look at
/proc/self/maps. If it reports vm_start + PAGE_SIZE rather than addr
returned by mmap, then the kernel is old.

> Second I guess we might need to detect @stack_guard_gap runtime as
> well

I do not think so. criu does not need to know about the new guard area
at all. It simply doesn't exist from user-space pov.

In fact, I think this should have been true even before this change, just
stack_guard_page_start() was not accurate and this is the reason (I guess)
you had to play with stack guard; the first page (hidden by show_map_vma)
can have a valid stack data, for example if the application played with
MAP_FIXED or munmap().

So I think you should simply disable, say, unmap_guard_pages() and most
of all other MAP_GROWSDOWN code in criu.

Oleg.