[PATCH 5/5] mm: add process_memwatch syscall documentation

From: Muhammad Usama Anjum
Date: Tue Jul 26 2022 - 12:21:56 EST


Add the syscall with explanation of the operations.

Signed-off-by: Muhammad Usama Anjum <usama.anjum@xxxxxxxxxxxxx>
---
Documentation/admin-guide/mm/soft-dirty.rst | 48 ++++++++++++++++++++-
1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/mm/soft-dirty.rst b/Documentation/admin-guide/mm/soft-dirty.rst
index cb0cfd6672fa..030d75658010 100644
--- a/Documentation/admin-guide/mm/soft-dirty.rst
+++ b/Documentation/admin-guide/mm/soft-dirty.rst
@@ -5,7 +5,12 @@ Soft-Dirty PTEs
===============

The soft-dirty is a bit on a PTE which helps to track which pages a task
-writes to. In order to do this tracking one should
+writes to.
+
+Using Proc FS
+-------------
+
+In order to do this tracking one should

1. Clear soft-dirty bits from the task's PTEs.

@@ -20,6 +25,47 @@ writes to. In order to do this tracking one should
64-bit qword is the soft-dirty one. If set, the respective PTE was
written to since step 1.

+Using System Call
+-----------------
+
+process_memwatch system call can be used to find the dirty pages.::
+
+ long process_memwatch(int pidfd, unsigned long start, int len,
+ unsigned int flags, void *vec, int vec_len);
+
+The pidfd specifies the pidfd of process whose memory needs to be watched.
+The calling process must have PTRACE_MODE_ATTACH_FSCREDS capabilities over
+the process whose pidfd has been specified. It can be zero which means that
+the process wants to watch its own memory. The operation is determined by
+flags. The start argument must be a multiple of the system page size. The
+len argument need not be a multiple of the page size, but since the
+information is returned for the whole pages, len is effectively rounded
+up to the next multiple of the page size.
+
+The vec is output array in which the offsets of the pages are returned.
+Offset is calculated from start address. User lets the kernel know about the
+size of the vec by passing size in vec_len. The system call returns when the
+whole range has been searched or vec is completely filled. The whole range
+isn't cleared if vec fills up completely.
+
+The flags argument specifies the operation to be performed. The MEMWATCH_SD_GET
+and MEMWATCH_SD_CLEAR operations can be used separately or together to perform
+MEMWATCH_SD_GET and MEMWATCH_SD_CLEAR atomically as one operation.::
+
+ MEMWATCH_SD_GET
+ Get the page offsets which are soft dirty.
+
+ MEMWATCH_SD_CLEAR
+ Clear the pages which are soft dirty.
+
+ MEMWATCH_SD_NO_REUSED_REGIONS
+ This optional flag can be specified in combination with other flags.
+ VM_SOFTDIRTY is ignored for the VMAs for performances reasons. This
+ flag shows only those pages dirty which have been written to by the
+ user. All new allocations aren't returned to be dirty.
+
+Explanation
+-----------

Internally, to do this tracking, the writable bit is cleared from PTEs
when the soft-dirty bit is cleared. So, after this, when the task tries to
--
2.30.2