[test patch] dirty shared mappings (was Re: ... fragmentation)

Benjamin C.R. LaHaise (blah@kvack.org)
Wed, 7 Jan 1998 17:48:46 -0500 (EST)


On Tue, 6 Jan 1998 tytso@mit.edu wrote:

> Date: Fri, 2 Jan 1998 12:09:45 -0800 (PST)
> From: Linus Torvalds <torvalds@transmeta.com>
>
> It's only when you want a _specific_ physical page that the lack of
> reverse mapping is painful. That does happen with shared non-COW pages
> occasionally (paging them out would be complex), but UNIX semantics tends
> to make it fairly easy - the only shared non-COW pages that exist have a
> well-specified backing store that all processes can agree about, so there
> is no ambiguity about where on the disk a page should be. It does result
> in potentially unnecessary page-outs (when multiple processes have the
> same page dirty), but it's a pretty rare condition.
>
> Actually, there's one *really* large and very successful commercial
> company that I know of where a developer has found this to be a pretty
> major bottleneck in performance due to the way their application
> libraries use shared libraries --- he's done benchmarks and proved it.

Thinking about it, this might be a factor in the major slowdown of inn at
exec time - an munmap of the shared file will sync it, right?

> (If you think about it, it's actually not all that surprising. If
> you're using shared memory or a mmap'ed file for doing IPC, it's
> actually quite likely that multiple processes will be dirtying the same
> page. Certain news implementations where the group file is mmap'ed in
> would have the same property.)

Well, which of the two approaches do we want to take? I already have
a working (and by now, well tested) pte lists patch, with the drawback of
doubling the size of page tables. Alternatively, throwing together a
patch that walks i_mmap and undirties pages wouldn't be too hard. <pause>
Done! This is untested, but it compiles and looks right. Just looking at
how it works shows that it'll probably thrash the cache quite badly,
whereas the pte_list stuff pays the price over time by linking/unlinking.

Hmmm.... I finally agreeing with Linus - now if only shared private
pages had an inode mapping we could walk... But then they must all be
clean (unless they're a privately mapped file not in the swap cache and
there's an exec). Ho hum.

-ben

diff -ur linux-2.1.78/mm/filemap.c linux/mm/filemap.c
--- linux-2.1.78/mm/filemap.c Sun Jan 4 03:53:41 1998
+++ linux/mm/filemap.c Wed Jan 7 17:27:22 1998
@@ -924,6 +924,37 @@
return retval;
}

+/*
+ * This is simpler than I thought it would be, but it will take cache misses like crazy. --bcrl
+ */
+static void mark_inode_mappings_clean(struct inode *inode, unsigned long offset, unsigned long page)
+{
+ struct vm_area_struct *vma;
+
+ for (vma = inode->i_mmap; NULL != vma; vma = vma->vm_next_share) {
+ unsigned long addr = offset - vma->vm_offset + vma->vm_start;
+ if ((vma->vm_offset <= offset) &&
+ (addr <= vma->vm_start)) {
+ pgd_t *pgd;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ pgd = pgd_offset(vma->vm_mm, addr);
+ if (pgd_none(*pgd) || pgd_bad(*pgd))
+ continue;
+ pmd = pmd_offset(pgd, addr);
+ if (pmd_none(*pmd) || pmd_bad(*pmd))
+ continue;
+ pte = pte_offset(pmd, addr);
+ if (!pte_present(*pte) || !pte_dirty(*pte) || (pte_page(*pte) != page))
+ continue;
+
+ set_pte(pte, pte_mkclean(*pte));
+ flush_tlb_page(vma, addr);
+ }
+ }
+}
+
static int filemap_write_page(struct vm_area_struct * vma,
unsigned long offset,
unsigned long page)
@@ -934,6 +965,10 @@
struct inode * inode;
struct buffer_head * bh;

+ dentry = vma->vm_dentry;
+ inode = dentry->d_inode;
+ mark_inode_mappings_clean(inode, offset, page);
+
bh = mem_map[MAP_NR(page)].buffers;
if (bh) {
/* whee.. just mark the buffer heads dirty */
@@ -949,8 +984,6 @@
return 0;
}

- dentry = vma->vm_dentry;
- inode = dentry->d_inode;
file.f_op = inode->i_op->default_file_ops;
if (!file.f_op->write)
return -EIO;