[PATCH 38/70] mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas

From: Luis Henriques
Date: Tue Jun 04 2013 - 10:18:50 EST


3.5.7.14 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Cliff Wickman <cpw@xxxxxxx>

commit a9ff785e4437c83d2179161e012f5bdfbd6381f0 upstream.

A panic can be caused by simply cat'ing /proc/<pid>/smaps while an
application has a VM_PFNMAP range. It happened in-house when a
benchmarker was trying to decipher the memory layout of his program.

/proc/<pid>/smaps and similar walks through a user page table should not
be looking at VM_PFNMAP areas.

Certain tests in walk_page_range() (specifically split_huge_page_pmd())
assume that all the mapped PFN's are backed with page structures. And
this is not usually true for VM_PFNMAP areas. This can result in panics
on kernel page faults when attempting to address those page structures.

There are a half dozen callers of walk_page_range() that walk through a
task's entire page table (as N. Horiguchi pointed out). So rather than
change all of them, this patch changes just walk_page_range() to ignore
VM_PFNMAP areas.

The logic of hugetlb_vma() is moved back into walk_page_range(), as we
want to test any vma in the range.

VM_PFNMAP areas are used by:
- graphics memory manager gpu/drm/drm_gem.c
- global reference unit sgi-gru/grufile.c
- sgi special memory char/mspec.c
- and probably several out-of-tree modules

[akpm@xxxxxxxxxxxxxxxxxxxx: remove now-unused hugetlb_vma() stub]
Signed-off-by: Cliff Wickman <cpw@xxxxxxx>
Reviewed-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Cc: Mel Gorman <mel@xxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
Cc: David Sterba <dsterba@xxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxx>
Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Luis Henriques <luis.henriques@xxxxxxxxxxxxx>
---
mm/pagewalk.c | 70 ++++++++++++++++++++++++++++++-----------------------------
1 file changed, 36 insertions(+), 34 deletions(-)

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 6c118d0..449cad6 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -127,28 +127,7 @@ static int walk_hugetlb_range(struct vm_area_struct *vma,
return 0;
}

-static struct vm_area_struct* hugetlb_vma(unsigned long addr, struct mm_walk *walk)
-{
- struct vm_area_struct *vma;
-
- /* We don't need vma lookup at all. */
- if (!walk->hugetlb_entry)
- return NULL;
-
- VM_BUG_ON(!rwsem_is_locked(&walk->mm->mmap_sem));
- vma = find_vma(walk->mm, addr);
- if (vma && vma->vm_start <= addr && is_vm_hugetlb_page(vma))
- return vma;
-
- return NULL;
-}
-
#else /* CONFIG_HUGETLB_PAGE */
-static struct vm_area_struct* hugetlb_vma(unsigned long addr, struct mm_walk *walk)
-{
- return NULL;
-}
-
static int walk_hugetlb_range(struct vm_area_struct *vma,
unsigned long addr, unsigned long end,
struct mm_walk *walk)
@@ -198,30 +177,53 @@ int walk_page_range(unsigned long addr, unsigned long end,
if (!walk->mm)
return -EINVAL;

+ VM_BUG_ON(!rwsem_is_locked(&walk->mm->mmap_sem));
+
pgd = pgd_offset(walk->mm, addr);
do {
- struct vm_area_struct *vma;
+ struct vm_area_struct *vma = NULL;

next = pgd_addr_end(addr, end);

/*
- * handle hugetlb vma individually because pagetable walk for
- * the hugetlb page is dependent on the architecture and
- * we can't handled it in the same manner as non-huge pages.
+ * This function was not intended to be vma based.
+ * But there are vma special cases to be handled:
+ * - hugetlb vma's
+ * - VM_PFNMAP vma's
*/
- vma = hugetlb_vma(addr, walk);
+ vma = find_vma(walk->mm, addr);
if (vma) {
- if (vma->vm_end < next)
+ /*
+ * There are no page structures backing a VM_PFNMAP
+ * range, so do not allow split_huge_page_pmd().
+ */
+ if ((vma->vm_start <= addr) &&
+ (vma->vm_flags & VM_PFNMAP)) {
next = vma->vm_end;
+ pgd = pgd_offset(walk->mm, next);
+ continue;
+ }
/*
- * Hugepage is very tightly coupled with vma, so
- * walk through hugetlb entries within a given vma.
+ * Handle hugetlb vma individually because pagetable
+ * walk for the hugetlb page is dependent on the
+ * architecture and we can't handled it in the same
+ * manner as non-huge pages.
*/
- err = walk_hugetlb_range(vma, addr, next, walk);
- if (err)
- break;
- pgd = pgd_offset(walk->mm, next);
- continue;
+ if (walk->hugetlb_entry && (vma->vm_start <= addr) &&
+ is_vm_hugetlb_page(vma)) {
+ if (vma->vm_end < next)
+ next = vma->vm_end;
+ /*
+ * Hugepage is very tightly coupled with vma,
+ * so walk through hugetlb entries within a
+ * given vma.
+ */
+ err = walk_hugetlb_range(vma, addr, next, walk);
+ if (err)
+ break;
+ pgd = pgd_offset(walk->mm, next);
+ continue;
+ }
}

if (pgd_none_or_clear_bad(pgd)) {
--
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/