Re: [PATCH] mm: test page->flags directly in page_lru()

From: Yu Zhao
Date: Wed Feb 24 2021 - 17:35:51 EST


On Wed, Feb 24, 2021 at 09:56:39PM +0000, Matthew Wilcox wrote:
> On Wed, Feb 24, 2021 at 05:15:58AM -0800, Andrew Morton wrote:
> > Here's what the preprocessor produces for an allmodconfig version of
> > PageActive():
> >
> > static inline __attribute__((__gnu_inline__)) __attribute__((__unused__)) __attribute__((no_instrument_function)) __attribute__((__always_inline__)) int PageActive(struct page *page)
> > {
> > return test_bit(PG_active, &({ do { if (__builtin_expect(!!(PagePoisoned(compound_head(page))), 0)) { dump_page(compound_head(page), "VM_BUG_ON_PAGE(" "PagePoisoned(compound_head(page))"")"); do { ({ asm volatile("%c0: nop\n\t" ".pushsection .discard.instr_begin\n\t" ".long %c0b - .\n\t" ".popsection\n\t" : : "i" (373)); }); do { asm volatile("1:\t" ".byte 0x0f, 0x0b" "\n" ".pushsection __bug_table,\"aw\"\n" "2:\t" ".long " "1b" " - 2b" "\t# bug_entry::bug_addr\n" "\t" ".long " "%c0" " - 2b" "\t# bug_entry::file\n" "\t.word %c1" "\t# bug_entry::line\n" "\t.word %c2" "\t# bug_entry::flags\n" "\t.org 2b+%c3\n" ".popsection" : : "i" ("./include/linux/page-flags.h"), "i" (338), "i" (0), "i" (sizeof(struct bug_entry))); } while (0); do { ({ asm volatile("%c0:\n\t" ".pushsection .discard.unreachable\n\t" ".long %c0b - .\n\t" ".popsection\n\t" : : "i" (374)); }); asm volatile(""); __builtin_unreachable(); } while (0); } while (0); } } while (0); compound_head(page); })->flags);
> >
> > }
> >
> > That's all to test a single bit!
> >
> > Four calls to compound_head().
>
> If only somebody were working on a patch series to get rid of
> all those calls to compound_head()! Some reviews on
> https://lore.kernel.org/linux-mm/20210128070404.1922318-2-willy@xxxxxxxxxxxxx/
> would be nice.

I'm on board with the idea and have done some research in this
direction. We've found that the ideal *anon* page size for Chrome OS
is not 4KB or 2MB, but 32KB. I hope we could leverage the folio to
support flexible anon page size to reduce the number of page faults
(vs 4KB) or internal fragmentation (vs 2MB).

That being said, it seems to me this is a long term plan and right
now we need something smaller. So if you don't mind, I'll just go
ahead and remove compound_head() from Page{LRU,Active,Unevictable,
SwapBacked} first?

> So, I haven't done page_lru() yet in my folio tree. What I would do is:
>
> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> index 355ea1ee32bd..3895cfe6502b 100644
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -63,22 +63,27 @@ static __always_inline void __clear_page_lru_flags(struct page *page)
> * Returns the LRU list a page should be on, as an index
> * into the array of LRU lists.
> */
> -static __always_inline enum lru_list page_lru(struct page *page)
> +static __always_inline enum lru_list folio_lru(struct folio *folio)
> {
> enum lru_list lru;
>
> - VM_BUG_ON_PAGE(PageActive(page) && PageUnevictable(page), page);
> + VM_BUG_ON_PAGE(FolioActive(folio) && FolioUnevictable(folio), folio);
>
> - if (PageUnevictable(page))
> + if (FolioUnevictable(folio))
> return LRU_UNEVICTABLE;
>
> - lru = page_is_file_lru(page) ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON;
> - if (PageActive(page))
> + lru = page_is_file_lru(&folio->page) ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON;
> + if (FolioActive(folio))
> lru += LRU_ACTIVE;
>
> return lru;
> }
>
> +static __always_inline enum lru_list page_lru(struct page *page)
> +{
> + return folio_lru(page_folio(page));
> +}
> +
> static __always_inline void add_page_to_lru_list(struct page *page,
> struct lruvec *lruvec)
> {
>
> That would cause compound_head() to be called once instead of four times
> (assuming VM_BUG_ON is enabled). It can be reduced down to zero times
> when the callers are converted from being page-based to being folio-based.
>
> There is a further problem with PageFoo() being a READ_ONCE()
> of page->flags, so the compiler can't CSE it. I have ideas in that
> direction too; essentially ...
>
> unsigned long page_flags = PageFlags(page);
>
> if (PageFlagUnevictable(flags))
> ...
> if (PageFlagsActive(flags))
> ...
>
> and we can generate the PageFlagsFoo macros with the same machinery in
> page-flags.h that generates PageFoo and FolioFoo. This strikes me as
> less critical than the folio work to remove all the unnecessary calls
> to compound_head().
>
> > movq %rbx, %rbp # page, _14
> > # ./include/linux/page-flags.h:184: unsigned long head = READ_ONCE(page->compound_head);
> > call __sanitizer_cov_trace_pc #
>
> It's a bit unfair to complain about code generation with a
> sanitizer-enabled build ...
>