Re: summarize all information again at bottom//reply: reply: [PATCH] mm: fix a race scenario in folio_isolate_lru

From: Matthew Wilcox
Date: Wed Mar 27 2024 - 10:09:20 EST


On Wed, Mar 27, 2024 at 09:25:59AM +0800, Zhaoyang Huang wrote:
> > Ignoring any other thread, you're basically saying that there's a
> > refcount imbalance here. Which means we'd hit an assert (that folio
> > refcount went below zero) in the normal case where another thread wasn't
> > simultaneously trying to do anything.
> Theoretically Yes but it is rare in practice as aops->readahead will
> launch all pages to IO under most scenarios.

Rare, but this path has been tested.

> read_pages
> aops->readahead[1]
> ...
> while (folio = readahead_folio)[2]
> filemap_remove_folio
>
> IMO, according to the comments of readahead_page, the refcnt
> represents page cache dropped in [1] makes sense for two reasons, '1.
> The folio is going to do IO and is locked until IO done;2. The refcnt
> will be added back when found again from the page cache and then serve
> for PTE or vfs' while it doesn't make sense in [2] as the refcnt of
> page cache will be dropped in filemap_remove_folio
>
> * Context: The page is locked and has an elevated refcount. The caller
> * should decreases the refcount once the page has been submitted for I/O
> * and unlock the page once all I/O to that page has completed.
> * Return: A pointer to the next page, or %NULL if we are done.

Follow the refcount through.

In page_cache_ra_unbounded():

folio = filemap_alloc_folio(gfp_mask, 0);
(folio has refcount 1)
ret = filemap_add_folio(mapping, folio, index + i, gfp_mask);
(folio has refcount 2)

Then we call read_pages()
First we call ->readahead() which for some reason stops early.
Then we call readahead_folio() which calls folio_put()
(folio has refcount 1)
Then we call folio_get()
(folio has refcount 2)
Then we call filemap_remove_folio()
(folio has refcount 1)
Then we call folio_unlock()
Then we call folio_put()
(folio has refcount 0 and is freed)

Yes, other things can happen in there to increment the refcount, so this
folio_put() might not be the last put, but we hold the folio locked the
entire time, so many things which might be attempted will block on the
folio lock. In particular, nobody can remove it from the page cache,
so its refcount cannot reach 0 until the last folio_put() of the
sequence.