Re: [RFC v3 2/4] mm: move PG_slab flag to page_type

From: Matthew Wilcox
Date: Fri Feb 03 2023 - 11:20:14 EST


On Fri, Feb 03, 2023 at 04:00:08PM +0000, Hyeonggon Yoo wrote:
> On Mon, Jan 30, 2023 at 05:11:48AM +0000, Matthew Wilcox wrote:
> > On Mon, Jan 30, 2023 at 01:34:59PM +0900, Hyeonggon Yoo wrote:
> > > > Seems like quite some changes to page_type to accomodate SLAB, which is
> > > > hopefully going away soon(TM). Could we perhaps avoid that?
> > >
> > > If it could be done with less changes, I'll try to avoid that.
> >
> > Let me outline the idea I had for removing PG_slab:
> >
> > Observe that PG_reserved and PG_slab are mutually exclusive. Also,
> > if PG_reserved is set, no other flags are set. If PG_slab is set, only
> > PG_locked is used. Many of the flags are only for use by anon/page
> > cache pages (eg referenced, uptodate, dirty, lru, active, workingset,
> > waiters, error, owner_priv_1, writeback, mappedtodisk, reclaim,
> > swapbacked, unevictable, mlocked).
> >
> > Redefine PG_reserved as PG_kernel. Now we can use the other _15_
> > flags to indicate pagetype, as long as PG_kernel is set.
>
> So PG_kernel is a new special flag, I thought it indicates
> "not usermappable pages", but considering PG_vmalloc it's not.

Right, it means "The kernel allocated this page for its own purposes;
what that purpose is might be available by looking at PG_type". ie
it's not-anon, not-page-cache.

> > So, eg
> > PageSlab() can now be (page->flags & PG_type) == PG_slab where
>
> But if PG_xxx and PG_slab shares same bit, PG_xxx would be confused?

Correct. Ideally those tests wouldn't be used on arbitrary pages,
only pages which are already confirmed to be anon or file. I suspect
we haven't been super-careful about that in the past, and so there
would be some degree of "Oh, we need to fix this up". But flags like
PG_mappedtodisk, PG_mlocked, PG_unevictable, PG_workingset should be
all gated behind "We know this is anon/file".

> > #define PG_kernel 0x00001
> > #define PG_type (PG_kernel | 0x7fff0)
> > #define PG_slab (PG_kernel | 0x00010)
> > #define PG_reserved (PG_kernel | 0x00020)
> > #define PG_buddy (PG_kernel | 0x00030)
> > #define PG_offline (PG_kernel | 0x00040)
> > #define PG_table (PG_kernel | 0x00050)
> > #define PG_guard (PG_kernel | 0x00060)
> >
> > That frees up the existing PG_slab, lets us drop the page_type field
> > altogether and gives us space to define all the page types we might
> > want (eg PG_vmalloc)
> >
> > We'll want to reorganise all the flags which are for anon/file pages
> > into a contiguous block. And now that I think about it, vmalloc pages
> > can be mapped to userspace, so they can get marked dirty, so only
> > 14 bits are available. Maybe rearrange to ...
> >
> > PG_locked 0x000001
> > PG_writeback 0x000002
> > PG_head 0x000004
>
> I think slab still needs PG_head,
> but it seems to be okay with this layout.
> (but these assumpstions are better documented, I think)

Yes, slab need PG_head so it knows whether this is a multi-page slab or
not. I forgot to mention it above as a bit that slab needs, but I put
it in the low bits here.

> > PG_dirty 0x000008
> > PG_owner_priv_1 0x000010
> > PG_arch_1 0x000020
> > PG_private 0x000040
> > PG_waiters 0x000080
> > PG_kernel 0x000100
> > PG_referenced 0x000200
> > PG_uptodate 0x000400
> > PG_lru 0x000800
> > PG_active 0x001000
> > PG_workingset 0x002000
> > PG_error 0x004000
> > PG_private_2 0x008000
> > PG_mappedtodisk 0x010000
> > PG_reclaim 0x020000
> > PG_swapbacked 0x040000
> > PG_unevictable 0x080000
> > PG_mlocked 0x100000
> >
> > ... or something. There are a number of constraints and it may take
> > a few iterations to get this right. Oh, and if this is the layout
> > we use, then:
> >
> > PG_type 0x1fff00
> > PG_reserved (PG_kernel | 0x200)
> > PG_slab (PG_kernel | 0x400)
> > PG_buddy (PG_kernel | 0x600)
> > PG_offline (PG_kernel | 0x800)
> > PG_table (PG_kernel | 0xa00)
> > PG_guard (PG_kernel | 0xc00)
> > PG_vmalloc (PG_kernel | 0xe00)
>
> what is PG_vmalloc for, is it just an example for
> explaining possible layout?

I really want to mark pages as being allocated from vmalloc. It's
one of the things we could do to make debugging better.