[RFC 4/4] docs: arm64: Add help document for pte dirty state management

From: Anshuman Khandual
Date: Fri Jul 07 2023 - 01:34:20 EST


PTE dirty state management is non-trivial on arm64 platform. This document
explains how both software and hardware come together in correctly tracking
PTE ditry state across various page table transactions.

Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Cc: Will Deacon <will@xxxxxxxxxx>
Cc: Jonathan Corbet <corbet@xxxxxxx>
Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Cc: linux-doc@xxxxxxxxxxxxxxx
Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
---
Documentation/arch/arm64/index.rst | 1 +
Documentation/arch/arm64/pte-dirty.rst | 95 ++++++++++++++++++++++++++
2 files changed, 96 insertions(+)
create mode 100644 Documentation/arch/arm64/pte-dirty.rst

diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst
index d08e924204bf..522f887f2a60 100644
--- a/Documentation/arch/arm64/index.rst
+++ b/Documentation/arch/arm64/index.rst
@@ -22,6 +22,7 @@ ARM64 Architecture
perf
pointer-authentication
ptdump
+ pte-dirty
silicon-errata
sme
sve
diff --git a/Documentation/arch/arm64/pte-dirty.rst b/Documentation/arch/arm64/pte-dirty.rst
new file mode 100644
index 000000000000..a6401696f6a3
--- /dev/null
+++ b/Documentation/arch/arm64/pte-dirty.rst
@@ -0,0 +1,95 @@
+.. SPDX-License-Identifier: GPL-2.0
+=========================================
+Page Table Entry - Dirty State Management
+=========================================
+
+1. Introduction
+---------------
+
+arm64 platform defines pte_dirty() to determine if the pte has been dirtied
+i.e pte has been written info after the previous clean procedure. The dirty
+state tracking could be achieved, either via software or hardware pte dirty
+bit mechanism. On arm64 platform, pte_dirty() is implemented utilizing both
+software and hardware dirty bits, making it non intuitive unlike many other
+platforms.
+
+2. PTE Dirty Bits (SW and HW)
+-----------------------------
+Following are relevant PTE bit positions for dirty state tracking.
+
+- PTE_DIRTY is a software bit (55) in the PTE
+- PTE_RDONLY is a hardware bit (7) in the PTE
+- PTE_DBM is a hardware bit (51) in the PTE
+- PTE_WRITE is a hardware bit (51) in the PTE - share position with PTE_DBM
+
+3. PTE Dirty State Tracking
+---------------------------
+Without ARM64_HW_AFDBM enabled, PTE dirty state is tracked only in the SW.
+PTE is marked read-only in HW, subsequent write access generates page fault
+which can update the SW dirty bit and clear the read-only access in HW.
+
+With ARM64_HW_AFDBM enabled, PTE dirty state is tracked both in SW and HW.
+PTE is marked read-only in HW while also enabling DBM tracking. Any write
+access will clear the read-only bit while also preventing a page fault. As
+PTE_DBM and PTE_WRITE share the same bit position, a dirty non-writable PTE
+state cannot be tracked in hardware. This in turn necessitates dirty state
+tracking (ARM64_HW_AFDBM enabled) to accommodate both software and hardware
+PTE bits. This helps in avoiding a runtime check for ARM64_HW_AFDBM feature
+being enabled on a given implementation.
+
+Testing and clearing PTE dirty state is relatively simple -
+
+#define pte_hw_dirty(pte) (pte_write(pte) && !pte_rdonly(pte))
+#define pte_sw_dirty(pte) (!!(pte_val(pte) & PTE_DIRTY))
+#define pte_dirty(pte) (pte_sw_dirty(pte) || pte_hw_dirty(pte))
+
+static inline pte_t pte_mkclean(pte_t pte)
+{
+ /*
+ * Subsequent call to pte_hw_clr_dirty() is not required
+ * because pte_sw_clr_dirty() in turn does that as well.
+ */
+ return pte_sw_clr_dirty(pte);
+}
+
+But marking a dirty state, creating a write protected entry etc now becomes
+bit non-trivial in hardware. as PTE_RDONLY bit could only be cleared if the
+write bit is also set.
+
+static inline pte_t pte_hw_mkdirty(pte_t pte)
+{
+ if (pte_write(pte))
+ return clear_pte_bit(pte, __pgprot(PTE_RDONLY));
+
+ return pte;
+}
+
+Hence marking a dirty state triggers marking both SW and HW dirty bits, so
+that if the HW suppoprt is unavailable or insufficient (dirty non-writable)
+, SW mechanism would still put it in a dirty state.
+
+static inline pte_t pte_mkdirty(pte_t pte)
+{
+ pte = pte_sw_mkdirty(pte);
+ pte = pte_hw_mkdirty(pte);
+ return pte;
+}
+
+4. Preserving PTE HW Dirty State
+--------------------------------
+If for some reason HW dirty bits (PTE_WRITE, PTE_RDONLY) need to be cleared
+the dirty state must be transferred as SW dirty bit ensuring persistence of
+the dirty state across the operation.
+
+static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
+{
+ .....
+ pte = pte_preserve_hw_dirty(pte_t pte);
+ .....
+}
+
+static inline pte_t pte_wrprotect(pte_t pte)
+{
+ pte = pte_preserve_hw_dirty(pte_t pte);
+ .....
+}
--
2.30.2