To supplement, I think the unreclaimed dirty folio of the inactive file lru in shrink_folio_list() can exit the recyling flow in advance and avoid to execute some time-consuming interfaces, such as folio_check_references() and try_to_unmap().
在 2023/10/19 22:15, David Hildenbrand 写道:
[你通常不会收到来自 david@xxxxxxxxxx 的电子邮件。请访问 https://aka.ms/LearnAboutSenderIdentification,以了解这一点为什么很重要;]Hi,
On 19.10.23 15:14, Zhiguo Jiang wrote:
In the shrink_folio_list() the sources of the file dirty folio include
two ways below:
1. The dirty folio is from the incoming parameter folio_list,
which is the inactive file lru.
2. The dirty folio is from the PTE dirty bit transferred by
the try_to_unmap().
For the first source of the dirty folio, if the dirty folio does not
support pageout, the dirty folio can skip unmap in advance to reduce
recyling time.
Signed-off-by: Zhiguo Jiang <justinjiang@xxxxxxxx>
---
Changelog:
v1->v2:
1. Keep the original judgment flow.
2. Add the interface of folio_check_pageout().
3. The dirty folio which does not support pageout in inactive file lru
skip unmap in advance.
mm/vmscan.c | 103 +++++++++++++++++++++++++++++++++-------------------
1 file changed, 66 insertions(+), 37 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a68d01fcc307..e067269275a5 100755
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -925,6 +925,44 @@ static void folio_check_dirty_writeback(struct folio *folio,
mapping->a_ops->is_dirty_writeback(folio, dirty, writeback);
}
+/* Check if a dirty folio can support pageout in the recyling process*/
+static bool folio_check_pageout(struct folio *folio,
+ struct pglist_data *pgdat)
+{
+ int ret = true;
+
+ /*
+ * Anonymous folios are not handled by flushers and must be written
+ * from reclaim context. Do not stall reclaim based on them.
+ * MADV_FREE anonymous folios are put into inactive file list too.
+ * They could be mistakenly treated as file lru. So further anon
+ * test is needed.
+ */
+ if (!folio_is_file_lru(folio) ||
+ (folio_test_anon(folio) && !folio_test_swapbacked(folio)))
+ goto out;
+
+ if (folio_test_dirty(folio) &&
+ (!current_is_kswapd() ||
+ !folio_test_reclaim(folio) ||
+ !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
+ /*
+ * Immediately reclaim when written back.
+ * Similar in principle to folio_deactivate()
+ * except we already have the folio isolated
+ * and know it's dirty
+ */
+ node_stat_mod_folio(folio, NR_VMSCAN_IMMEDIATE,
+ folio_nr_pages(folio));
+ folio_set_reclaim(folio);
+
+ ret = false;
+ }
+
+out:
+ return ret;
+}
+
static struct folio *alloc_demote_folio(struct folio *src,
unsigned long private)
{
@@ -1078,6 +1116,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
if (dirty && !writeback)
stat->nr_unqueued_dirty += nr_pages;
+ /* If the dirty folio dose not support pageout,
+ * the dirty folio can skip this recycling.
+ */
+ if (!folio_check_pageout(folio, pgdat))
+ goto activate_locked;
+
/*
* Treat this folio as congested if folios are cycling
* through the LRU so quickly that the folios marked
@@ -1261,43 +1305,6 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
enum ttu_flags flags = TTU_BATCH_FLUSH;
bool was_swapbacked = folio_test_swapbacked(folio);
- if (folio_test_dirty(folio)) {
- /*
- * Only kswapd can writeback filesystem folios
- * to avoid risk of stack overflow. But avoid
- * injecting inefficient single-folio I/O into
- * flusher writeback as much as possible: only
- * write folios when we've encountered many
- * dirty folios, and when we've already scanned
- * the rest of the LRU for clean folios and see
- * the same dirty folios again (with the reclaim
- * flag set).
- */
- if (folio_is_file_lru(folio) &&
- (!current_is_kswapd() ||
- !folio_test_reclaim(folio) ||
- !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
- /*
- * Immediately reclaim when written back.
- * Similar in principle to folio_deactivate()
- * except we already have the folio isolated
- * and know it's dirty
- */
- node_stat_mod_folio(folio, NR_VMSCAN_IMMEDIATE,
- nr_pages);
- folio_set_reclaim(folio);
-
- goto activate_locked;
- }
-
- if (references == FOLIOREF_RECLAIM_CLEAN)
- goto keep_locked;
- if (!may_enter_fs(folio, sc->gfp_mask))
- goto keep_locked;
- if (!sc->may_writepage)
- goto keep_locked;
- }
-
if (folio_test_pmd_mappable(folio))
flags |= TTU_SPLIT_HUGE_PMD;
@@ -1323,6 +1330,28 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
mapping = folio_mapping(folio);
if (folio_test_dirty(folio)) {
+ /*
+ * Only kswapd can writeback filesystem folios
+ * to avoid risk of stack overflow. But avoid
+ * injecting inefficient single-folio I/O into
+ * flusher writeback as much as possible: only
+ * write folios when we've encountered many
+ * dirty folios, and when we've already scanned
+ * the rest of the LRU for clean folios and see
+ * the same dirty folios again (with the reclaim
+ * flag set).
+ */
+ if (folio_is_file_lru(folio) &&
+ !folio_check_pageout(folio, pgdat))
+ goto activate_locked;
+
+ if (references == FOLIOREF_RECLAIM_CLEAN)
+ goto keep_locked;
+ if (!may_enter_fs(folio, sc->gfp_mask))
+ goto keep_locked;
+ if (!sc->may_writepage)
+ goto keep_locked;
+
/*
* Folio is dirty. Flush the TLB if a writable entry
* potentially exists to avoid CPU writes after I/O
I'm confused. Did you apply this on top of v1 by accident?
According to my modified mm_vmscan_lru_shrink_inactive test tracelog, in the 32 scanned inactive file pages, 20 were dirty, and the 20 dirty pages were not reclamed, but they took 20us to perform try_to_unmap.
I think unreclaimed dirty folio in inactive file lru can skip to perform try_to_unmap. Please help to continue review. Thanks.
kswapd0-99 ( 99) [005] ..... 687.793724: mm_vmscan_lru_shrink_inactive: [Justin] nid 0 scan=32 isolate=32 reclamed=12 nr_dirty=20 nr_unqueued_dirty=20 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate[0]=0 nr_activate[1]=20 nr_ref_keep=0 nr_unmap_fail=0 priority=2 file=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC total=39 exe=0 reference_cost=5 reference_exe=0 unmap_cost=21 unmap_exe=0 dirty_unmap_cost=20 dirty_unmap_exe=0 pageout_cost=0 pageout_exe=0
--
Cheers,
David / dhildenb