Re: [PATCH 0/2] ZERO PAGE again v3.

From: KAMEZAWA Hiroyuki
Date: Mon Jul 13 2009 - 01:47:52 EST


Do you think this kind of document is necessary for v4 ?
Any commetns are welcome.
Maybe some amount of people are busy at Montreal, then I'm not in hurry ;)

==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>

Add a documenation about zero page at re-introducing it.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
---
Documentation/vm/zeropage.txt | 77 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 77 insertions(+)

Index: zeropage-trialv4/Documentation/vm/zeropage.txt
===================================================================
--- /dev/null
+++ zeropage-trialv4/Documentation/vm/zeropage.txt
@@ -0,0 +1,77 @@
+Zero Page.
+
+ZERO Page is a page filled with Zero and never modified (write-protected).
+Each arch has its own ZERO_PAGE in the kernel and macro ZERO_PAGE(addr) is
+provided. Now, usage of ZERO_PAGE() is limited.
+
+This documentation explains ZERO_PAGE() for private anonymous mappings.
+
+If CONFIG_SUPPORT_ANON_ZERO_PAGE==y, ZERO_PAGE is used for private anonymous
+mapping. If a read fault to anonymous private mapping occurs, ZERO_PAGE is
+mapped for the faulted address instead of an usual anonymous page. This mapped
+ZERO_PAGE is write-protected and the user process will do copy-on-write when
+it writes there. ZERO_PAGE is used only when vma is for PRIVATE mapping and
+has no vm_ops.
+
+Implementation Details
+ - ZERO_PAGE uses pte_special() for implementation. Then, an arch has to support
+ pte_special() to support ZERO_PAGE for Anon.
+ - ZERO_PAGE for anon has no reference counter manipulation at map/unmap.
+ - When get_user_pages() finds ZERO_PAGE, page->count is got/put.
+ - By passing special flags FOLL_NOZERO, the caller can ignore zero pages.
+ - Because ZERO_PAGE is used only when a read fault on MAP_PRIVATE anonymous
+ MAP_POPULATE may map ZERO_PAGE when it handles read only PRIVATE anonymous
+ mapping. Then, usual anonymous pages will be used in such case.
+ - At coredump, ZERO PAGE will be used for not-existing memory.
+
+For User Applications.
+
+ZERO Page is not the best solution for applications in many case. It's tend
+to be the second best if you have enough time to improve your applications.
+
+Pros. of ZERO Page
+ - not consume extra memory
+ - cpu cache over head is small.(if your cache is physically tagged.)
+ - page's reference count overhead is hidden. This is good for fork()/exec()
+ processes.
+
+Cons. of ZERO Page
+ - Just available for read-faulted anonymous private mappings.
+ - If applications depend on ZERO_PAGE, it means it consume extra TLB.
+ - you can only reduce the memory usage of read-faulted pages.
+
+ZERO Page is helpful in some cases but you can use following techniques.
+Followings are typical solutions for avoiding ZERO Pages. But please note, there
+are always trade-off among designs.
+
+ => Avoid large continuous mapping and use small mmaps.
+ If # of mmap doesn't increase very much, this is good because your
+ application can avoid TLB pollution by ZERO Page and never do unnecessary
+ access.
+
+ => Use large continuous mapping and see /proc/<pid>/pagemap
+ You can check "Which ptes are valid ?" by checking /proc/<pid>/pagemap
+ and avoid unnecessary fault at scanning memory range. But reading
+ /proc/<pid>/pagemap is not very low cost, then the benefit of this technique
+ is depends on usage.
+
+ => Use KSM.(to be implemented..)
+ KSM(kernel shared memory) can merge your anonymous mapped pages with pages
+ of the same contents. Then, ZERO Page will be merged and more pages will
+ be merged. But in bad case, pages are heavily shared and it may affects
+ performance of fork/exit/exec. Behavior depends on the latest KSM
+ implementations, please check.
+
+For kernel developers.
+ Your arch has to support pte_special() and add ARCH_SUPPORT_ANON_ZERO_PAGE=y
+ to use ZERO PAGE. If your arch's cpu-cache is virtually tagged, it's
+ recommended to turn off this feature. To test this, following case should
+ be checked.
+ - mmap/munmap/fork/exit/exec and touch anonymous private pages by READ.
+ - MAP_POPULATE in above test.
+ - mlock()
+ - coredump
+ - /dev/zero PRIVATE mapping
+
+
+

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/