[PATCH 3.16.y-ckt 178/183] mm: softdirty: unmapped addresses between VMAs are clean

From: Luis Henriques
Date: Fri Mar 06 2015 - 05:02:09 EST


3.16.7-ckt8 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Peter Feiner <pfeiner@xxxxxxxxxx>

commit 81d0fa623c5b8dbd5279d9713094b0f9b0a00fb4 upstream.

If a /proc/pid/pagemap read spans a [VMA, an unmapped region, then a
VM_SOFTDIRTY VMA], the virtual pages in the unmapped region are reported
as softdirty. Here's a program to demonstrate the bug:

int main() {
const uint64_t PAGEMAP_SOFTDIRTY = 1ul << 55;
uint64_t pme[3];
int fd = open("/proc/self/pagemap", O_RDONLY);;
char *m = mmap(NULL, 3 * getpagesize(), PROT_READ,
MAP_ANONYMOUS | MAP_SHARED, -1, 0);
munmap(m + getpagesize(), getpagesize());
pread(fd, pme, 24, (unsigned long) m / getpagesize() * 8);
assert(pme[0] & PAGEMAP_SOFTDIRTY); /* passes */
assert(!(pme[1] & PAGEMAP_SOFTDIRTY)); /* fails */
assert(pme[2] & PAGEMAP_SOFTDIRTY); /* passes */
return 0;
}

(Note that all pages in new VMAs are softdirty until cleared).

Tested:
Used the program given above. I'm going to include this code in
a selftest in the future.

[n-horiguchi@xxxxxxxxxxxxx: prevent pagemap_pte_range() from overrunning]
Signed-off-by: Peter Feiner <pfeiner@xxxxxxxxxx>
Cc: "Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx>
Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
Cc: Pavel Emelyanov <xemul@xxxxxxxxxxxxx>
Cc: Jamie Liu <jamieliu@xxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
[ luis: 3.16-stable prereq for:
05fbf357d941 "proc/pagemap: walk page tables under pte lock" ]
Signed-off-by: Luis Henriques <luis.henriques@xxxxxxxxxxxxx>
---
fs/proc/task_mmu.c | 61 +++++++++++++++++++++++++++++++++++-------------------
1 file changed, 40 insertions(+), 21 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index cfa63ee92c96..143674aef97c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1005,7 +1005,6 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
spinlock_t *ptl;
pte_t *pte;
int err = 0;
- pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));

/* find the first VMA at or above 'addr' */
vma = find_vma(walk->mm, addr);
@@ -1019,6 +1018,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,

for (; addr != end; addr += PAGE_SIZE) {
unsigned long offset;
+ pagemap_entry_t pme;

offset = (addr & ~PAGEMAP_WALK_MASK) >>
PAGE_SHIFT;
@@ -1033,32 +1033,51 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,

if (pmd_trans_unstable(pmd))
return 0;
- for (; addr != end; addr += PAGE_SIZE) {
- int flags2;
-
- /* check to see if we've left 'vma' behind
- * and need a new, higher one */
- if (vma && (addr >= vma->vm_end)) {
- vma = find_vma(walk->mm, addr);
- if (vma && (vma->vm_flags & VM_SOFTDIRTY))
- flags2 = __PM_SOFT_DIRTY;
- else
- flags2 = 0;
- pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2));
+
+ while (1) {
+ /* End of address space hole, which we mark as non-present. */
+ unsigned long hole_end;
+
+ if (vma)
+ hole_end = min(end, vma->vm_start);
+ else
+ hole_end = end;
+
+ for (; addr < hole_end; addr += PAGE_SIZE) {
+ pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
+
+ err = add_to_pagemap(addr, &pme, pm);
+ if (err)
+ return err;
}

- /* check that 'vma' actually covers this address,
- * and that it isn't a huge page vma */
- if (vma && (vma->vm_start <= addr) &&
- !is_vm_hugetlb_page(vma)) {
+ if (!vma || vma->vm_start >= end)
+ break;
+ /*
+ * We can't possibly be in a hugetlb VMA. In general,
+ * for a mm_walk with a pmd_entry and a hugetlb_entry,
+ * the pmd_entry can only be called on addresses in a
+ * hugetlb if the walk starts in a non-hugetlb VMA and
+ * spans a hugepage VMA. Since pagemap_read walks are
+ * PMD-sized and PMD-aligned, this will never be true.
+ */
+ BUG_ON(is_vm_hugetlb_page(vma));
+
+ /* Addresses in the VMA. */
+ for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
+ pagemap_entry_t pme;
pte = pte_offset_map(pmd, addr);
pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
- /* unmap before userspace copy */
pte_unmap(pte);
+ err = add_to_pagemap(addr, &pme, pm);
+ if (err)
+ return err;
}
- err = add_to_pagemap(addr, &pme, pm);
- if (err)
- return err;
+
+ if (addr == end)
+ break;
+
+ vma = find_vma(walk->mm, addr);
}

cond_resched();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/