Re: [PATCH 3.10 268/268] mm: larger stack guard gap, between vmas

From: Hugh Dickins
Date: Wed Jun 21 2017 - 03:05:26 EST


On Mon, 19 Jun 2017, Willy Tarreau wrote:

> From: Hugh Dickins <hughd@xxxxxxxxxx>
>
> commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream.

Some of these suggested adjustments below are just what comparing mine
and yours showed up, and I'm being anal in passing them on e.g. I do
like your blank line in mm.h, but Michal chose to leave it out, and
I think that the closer we keep these sources to each other,
the less trouble we shall have patching on top in future.

Which is particularly true in expand_upwards() and expand_downwards()
(and you're thinking of backporting Helge's TASK_SIZE enhancement
on top of that, though I don't think it's strictly necessary for a
stable tree). Your patch is not wrong there: though odd to be trying
anon_vma_prepare() twice in expand_downwards(), and tiresome to have
to unlock at each error exit. But I'd already decided in one of our
internal trees just to factor in some of Konstantin's change, that
made the ordering much more sensible there, and the two more like
each other; so recommend that 3.10 do the same, keeping it closer
to the final 4.12 code. But you may have different priorities and
disagree with that: just suggesting.

And there is the possibility that we shall want another patch or
two on top there. I've left a question as to whether we should be
comparing anon_vmas. And there's a potential (but I think ignorable)
locking issue, in the case of an architecture that supports both
VM_GROWSUP and VM_GROWSDOWN: if they expand towards each other at the
same instant, they could gobble up the gap between them (they almost
certainly have different anon_vmas, so the anon_vma locking does not
protect against that). When it gets to updating the vma tree, it is
careful to use page_table_lock to maintain the consistency of the
tree in such a case, but maybe we should do that earlier.

Then there's the FOLL_MLOCK thing, and the WARN_ON (phew, remembered
in time that you don't have VM_WARN_ON) - but keep in mind that I
have not even built this tree, let alone tested it.

Sorry if I'm being annoying, Willy: you must be heartily sick of
these patches by now! Or, being a longtime longterm maintainer,
perhaps it's all joy for you ;-?

Hugh

diff -purN 310n/include/linux/mm.h 310h/include/linux/mm.h
--- 310n/include/linux/mm.h 2017-06-20 16:50:29.809546868 -0700
+++ 310h/include/linux/mm.h 2017-06-20 19:52:59.359942133 -0700
@@ -1595,7 +1595,6 @@ unsigned long ra_submit(struct file_ra_s
struct file *filp);

extern unsigned long stack_guard_gap;
-
/* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */
extern int expand_stack(struct vm_area_struct *vma, unsigned long address);

diff -purN 310n/mm/memory.c 310h/mm/memory.c
--- 310n/mm/memory.c 2017-06-20 16:50:29.809546868 -0700
+++ 310h/mm/memory.c 2017-06-20 19:57:14.537573559 -0700
@@ -1821,9 +1821,6 @@ long __get_user_pages(struct task_struct
int ret;
unsigned int fault_flags = 0;

- /* mlock all present pages, but do not fault in new pages */
- if (foll_flags & FOLL_MLOCK)
- goto next_page;
if (foll_flags & FOLL_WRITE)
fault_flags |= FAULT_FLAG_WRITE;
if (nonblocking)
diff -purN 310n/mm/mmap.c 310h/mm/mmap.c
--- 310n/mm/mmap.c 2017-06-20 16:50:29.809546868 -0700
+++ 310h/mm/mmap.c 2017-06-20 20:48:08.409202485 -0700
@@ -892,7 +892,7 @@ again: remove_next = 1 + (end > next->
else if (next)
vma_gap_update(next);
else
- mm->highest_vm_end = end;
+ WARN_ON(mm->highest_vm_end != vm_end_gap(vma));
}
if (insert && file)
uprobe_mmap(insert);
@@ -2123,48 +2123,39 @@ int expand_upwards(struct vm_area_struct
{
struct vm_area_struct *next;
unsigned long gap_addr;
- int error;
+ int error = 0;

if (!(vma->vm_flags & VM_GROWSUP))
return -EFAULT;

- /*
- * We must make sure the anon_vma is allocated
- * so that the anon_vma locking is not a noop.
- */
- if (unlikely(anon_vma_prepare(vma)))
- return -ENOMEM;
- vma_lock_anon_vma(vma);
-
- /*
- * vma->vm_start/vm_end cannot change under us because the caller
- * is required to hold the mmap_sem in read mode. We need the
- * anon_vma lock to serialize against concurrent expand_stacks.
- * Also guard against wrapping around to address 0.
- */
+ /* Guard against wrapping around to address 0. */
address &= PAGE_MASK;
address += PAGE_SIZE;
- if (!address) {
- vma_unlock_anon_vma(vma);
+ if (!address)
return -ENOMEM;
- }
- error = 0;

/* Enforce stack_guard_gap */
gap_addr = address + stack_guard_gap;
- if (gap_addr < address) {
- vma_unlock_anon_vma(vma);
+ if (gap_addr < address)
return -ENOMEM;
- }
next = vma->vm_next;
if (next && next->vm_start < gap_addr) {
- if (!(next->vm_flags & VM_GROWSUP)) {
- vma_unlock_anon_vma(vma);
+ if (!(next->vm_flags & VM_GROWSUP))
return -ENOMEM;
- }
/* Check that both stack segments have the same anon_vma? */
}

+ /* We must make sure the anon_vma is allocated. */
+ if (unlikely(anon_vma_prepare(vma)))
+ return -ENOMEM;
+
+ /*
+ * vma->vm_start/vm_end cannot change under us because the caller
+ * is required to hold the mmap_sem in read mode. We need the
+ * anon_vma lock to serialize against concurrent expand_stacks.
+ */
+ vma_lock_anon_vma(vma);
+
/* Somebody else might have raced and expanded it already */
if (address > vma->vm_end) {
unsigned long size, grow;
@@ -2218,46 +2209,32 @@ int expand_downwards(struct vm_area_stru
unsigned long gap_addr;
int error;

- /*
- * We must make sure the anon_vma is allocated
- * so that the anon_vma locking is not a noop.
- */
- if (unlikely(anon_vma_prepare(vma)))
- return -ENOMEM;
-
address &= PAGE_MASK;
error = security_mmap_addr(address);
if (error)
return error;

- vma_lock_anon_vma(vma);
-
/* Enforce stack_guard_gap */
gap_addr = address - stack_guard_gap;
- if (gap_addr > address) {
- vma_unlock_anon_vma(vma);
+ if (gap_addr > address)
return -ENOMEM;
- }
prev = vma->vm_prev;
if (prev && prev->vm_end > gap_addr) {
- if (!(prev->vm_flags & VM_GROWSDOWN)) {
- vma_unlock_anon_vma(vma);
+ if (!(prev->vm_flags & VM_GROWSDOWN))
return -ENOMEM;
- }
/* Check that both stack segments have the same anon_vma? */
}

/* We must make sure the anon_vma is allocated. */
- if (unlikely(anon_vma_prepare(vma))) {
- vma_unlock_anon_vma(vma);
+ if (unlikely(anon_vma_prepare(vma)))
return -ENOMEM;
- }

/*
* vma->vm_start/vm_end cannot change under us because the caller
* is required to hold the mmap_sem in read mode. We need the
* anon_vma lock to serialize against concurrent expand_stacks.
*/
+ vma_lock_anon_vma(vma);

/* Somebody else might have raced and expanded it already */
if (address < vma->vm_start) {