Re: [PATCH v2 1/2] mm:vmscan: the dirty folio in folio_list skip unmap

From: zhiguojiang
Date: Fri Oct 20 2023 - 00:10:09 EST




在 2023/10/20 11:59, zhiguojiang 写道:


在 2023/10/19 22:15, David Hildenbrand 写道:
[你通常不会收到来自 david@xxxxxxxxxx 的电子邮件。请访问 https://aka.ms/LearnAboutSenderIdentification,以了解这一点为什么很重要;]

On 19.10.23 15:14, Zhiguo Jiang wrote:
In the shrink_folio_list() the sources of the file dirty folio include
two ways below:
1. The dirty folio is from the incoming parameter folio_list,
    which is the inactive file lru.
2. The dirty folio is from the PTE dirty bit transferred by
    the try_to_unmap().

For the first source of the dirty folio, if the dirty folio does not
support pageout, the dirty folio can skip unmap in advance to reduce
recyling time.

Signed-off-by: Zhiguo Jiang <justinjiang@xxxxxxxx>
---

Changelog:
v1->v2:
1. Keep the original judgment flow.
2. Add the interface of folio_check_pageout().
3. The dirty folio which does not support pageout in inactive file lru
    skip unmap in advance.

  mm/vmscan.c | 103 +++++++++++++++++++++++++++++++++-------------------
  1 file changed, 66 insertions(+), 37 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a68d01fcc307..e067269275a5 100755
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -925,6 +925,44 @@ static void folio_check_dirty_writeback(struct folio *folio,
              mapping->a_ops->is_dirty_writeback(folio, dirty, writeback);
  }

+/* Check if a dirty folio can support pageout in the recyling process*/
+static bool folio_check_pageout(struct folio *folio,
+                                             struct pglist_data *pgdat)
+{
+     int ret = true;
+
+     /*
+      * Anonymous folios are not handled by flushers and must be written
+      * from reclaim context. Do not stall reclaim based on them.
+      * MADV_FREE anonymous folios are put into inactive file list too.
+      * They could be mistakenly treated as file lru. So further anon
+      * test is needed.
+      */
+     if (!folio_is_file_lru(folio) ||
+             (folio_test_anon(folio) && !folio_test_swapbacked(folio)))
+             goto out;
+
+     if (folio_test_dirty(folio) &&
+             (!current_is_kswapd() ||
+              !folio_test_reclaim(folio) ||
+              !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
+             /*
+              * Immediately reclaim when written back.
+              * Similar in principle to folio_deactivate()
+              * except we already have the folio isolated
+              * and know it's dirty
+              */
+             node_stat_mod_folio(folio, NR_VMSCAN_IMMEDIATE,
+                     folio_nr_pages(folio));
+             folio_set_reclaim(folio);
+
+             ret = false;
+     }
+
+out:
+     return ret;
+}
+
  static struct folio *alloc_demote_folio(struct folio *src,
              unsigned long private)
  {
@@ -1078,6 +1116,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
              if (dirty && !writeback)
                      stat->nr_unqueued_dirty += nr_pages;

+             /* If the dirty folio dose not support pageout,
+              * the dirty folio can skip this recycling.
+              */
+             if (!folio_check_pageout(folio, pgdat))
+                     goto activate_locked;
+
              /*
               * Treat this folio as congested if folios are cycling
               * through the LRU so quickly that the folios marked
@@ -1261,43 +1305,6 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
                      enum ttu_flags flags = TTU_BATCH_FLUSH;
                      bool was_swapbacked = folio_test_swapbacked(folio);

-                     if (folio_test_dirty(folio)) {
-                             /*
-                              * Only kswapd can writeback filesystem folios
-                              * to avoid risk of stack overflow. But avoid
-                              * injecting inefficient single-folio I/O into
-                              * flusher writeback as much as possible: only
-                              * write folios when we've encountered many
-                              * dirty folios, and when we've already scanned
-                              * the rest of the LRU for clean folios and see
-                              * the same dirty folios again (with the reclaim
-                              * flag set).
-                              */
-                             if (folio_is_file_lru(folio) &&
-                                     (!current_is_kswapd() ||
- !folio_test_reclaim(folio) ||
-                                      !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
-                                     /*
-                                      * Immediately reclaim when written back.
-                                      * Similar in principle to folio_deactivate()
-                                      * except we already have the folio isolated
-                                      * and know it's dirty
-                                      */
- node_stat_mod_folio(folio, NR_VMSCAN_IMMEDIATE,
- nr_pages);
- folio_set_reclaim(folio);
-
-                                     goto activate_locked;
-                             }
-
-                             if (references == FOLIOREF_RECLAIM_CLEAN)
-                                     goto keep_locked;
-                             if (!may_enter_fs(folio, sc->gfp_mask))
-                                     goto keep_locked;
-                             if (!sc->may_writepage)
-                                     goto keep_locked;
-                     }
-
                      if (folio_test_pmd_mappable(folio))
                              flags |= TTU_SPLIT_HUGE_PMD;

@@ -1323,6 +1330,28 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,

              mapping = folio_mapping(folio);
              if (folio_test_dirty(folio)) {
+                     /*
+                      * Only kswapd can writeback filesystem folios
+                      * to avoid risk of stack overflow. But avoid
+                      * injecting inefficient single-folio I/O into
+                      * flusher writeback as much as possible: only
+                      * write folios when we've encountered many
+                      * dirty folios, and when we've already scanned
+                      * the rest of the LRU for clean folios and see
+                      * the same dirty folios again (with the reclaim
+                      * flag set).
+                      */
+                     if (folio_is_file_lru(folio) &&
+                             !folio_check_pageout(folio, pgdat))
+                             goto activate_locked;
+
+                     if (references == FOLIOREF_RECLAIM_CLEAN)
+                             goto keep_locked;
+                     if (!may_enter_fs(folio, sc->gfp_mask))
+                             goto keep_locked;
+                     if (!sc->may_writepage)
+                             goto keep_locked;
+
                      /*
                       * Folio is dirty. Flush the TLB if a writable entry
                       * potentially exists to avoid CPU writes after I/O

I'm confused. Did you apply this on top of v1 by accident?
Hi,
According to my modified mm_vmscan_lru_shrink_inactive test tracelog, in the 32 scanned inactive file pages, 20 were dirty, and the 20 dirty pages were not reclamed, but they took 20us to perform try_to_unmap.

I think unreclaimed dirty folio in inactive file lru can skip to perform try_to_unmap. Please help to continue review. Thanks.

kswapd0-99      (     99) [005] .....   687.793724: mm_vmscan_lru_shrink_inactive: [Justin] nid 0 scan=32 isolate=32 reclamed=12 nr_dirty=20 nr_unqueued_dirty=20 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate[0]=0 nr_activate[1]=20 nr_ref_keep=0 nr_unmap_fail=0 priority=2 file=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC total=39 exe=0 reference_cost=5 reference_exe=0 unmap_cost=21 unmap_exe=0 dirty_unmap_cost=20 dirty_unmap_exe=0 pageout_cost=0 pageout_exe=0

To supplement, I think the unreclaimed dirty folio of the inactive file lru in shrink_folio_list() can exit the recyling flow in advance and avoid to execute some time-consuming interfaces, such as folio_check_references() and try_to_unmap().
--
Cheers,

David / dhildenb