Re: [RFC][PATCH] proc: export more page flags in /proc/kpageflags

From: Wu Fengguang
Date: Wed Apr 15 2009 - 23:49:51 EST


On Thu, Apr 16, 2009 at 10:26:51AM +0800, KOSAKI Motohiro wrote:
> tatus: RO
> Content-Length: 13245
> Lines: 380
>
> Hi
>
> > > > > On Tue, Apr 14, 2009 at 12:37:10PM +0800, KOSAKI Motohiro wrote:
> > > > > > > Export the following page flags in /proc/kpageflags,
> > > > > > > just in case they will be useful to someone:
> > > > > > >
> > > > > > > - PG_swapcache
> > > > > > > - PG_swapbacked
> > > > > > > - PG_mappedtodisk
> > > > > > > - PG_reserved
> > >
> > > PG_reserved should be exported as PG_KERNEL or somesuch.
> >
> > PG_KERNEL could be misleading. PG_reserved obviously do not cover all
> > (or most) kernel pages. So I'd prefer to export PG_reserved as it is.
> >
> > It seems that the vast amount of free pages are marked PG_reserved:
>
> Can I review the document at first?
> if no good document for administrator, I can't ack exposing PG_reserved.

btw, is this the expected behavior to mark so many free pages as PG_reserved?
Last time I looked at it, in 2.6.27, the free pages simply don't have
any flags set.

//Or maybe it's a false reporting of my tool. Will double check.

> > # uname -a
> > Linux hp 2.6.30-rc2 #157 SMP Wed Apr 15 19:37:49 CST 2009 x86_64 GNU/Linux
> > # echo 1 > /proc/sys/vm/drop_caches
> > # ./page-types
> > flags page-count MB symbolic-flags long-symbolic-flags
> > 0x004000 497474 1943 ______________r_____ reserved
> > 0x008000 4454 17 _______________o____ compound
> > 0x008014 5 0 __R_D__________o____ referenced,dirty,compound
> > 0x000020 1 0 _____l______________ lru
> > 0x000028 310 1 ___U_l______________ uptodate,lru
> > 0x00002c 18 0 __RU_l______________ referenced,uptodate,lru
> > 0x000068 80 0 ___U_lA_____________ uptodate,lru,active
> > 0x00006c 157 0 __RU_lA_____________ referenced,uptodate,lru,active
> > 0x002078 1 0 ___UDlA______b______ uptodate,dirty,lru,active,swapbacked
> > 0x00207c 17 0 __RUDlA______b______ referenced,uptodate,dirty,lru,active,swapbacked
> > 0x000228 13 0 ___U_l___x__________ uptodate,lru,reclaim
> > 0x000400 2085 8 __________B_________ buddy
>
> "freed" is better?
> buddy is implementation technique name.

Not compellingly better :-) I'd expect BUDDY to be a well recognized
technique, something close to LRU. PG_BUDDY could be documented as:
this page is owned by the buddy system, which manages free memory.

PG_FREED may seem more newbie friendly, but there will be the classical
newbie question: "Why so few freed pages?!" ;-)

It's not likely that an administrator not understanding BUDDY will
understand many of the other exported page flags. He will have to
query the document anyway. And exporting PG_buddy as it is could
be the best option for proficient users.

> > 0x000804 1 0 __R________m________ referenced,mmap
> > 0x002808 10 0 ___U_______m_b______ uptodate,mmap,swapbacked
> > 0x000828 1060 4 ___U_l_____m________ uptodate,lru,mmap
> > 0x00082c 215 0 __RU_l_____m________ referenced,uptodate,lru,mmap
> > 0x000868 189 0 ___U_lA____m________ uptodate,lru,active,mmap
> > 0x002868 4187 16 ___U_lA____m_b______ uptodate,lru,active,mmap,swapbacked
> > 0x00286c 30 0 __RU_lA____m_b______ referenced,uptodate,lru,active,mmap,swapbacked
> > 0x00086c 1012 3 __RU_lA____m________ referenced,uptodate,lru,active,mmap
> > 0x002878 3 0 ___UDlA____m_b______ uptodate,dirty,lru,active,mmap,swapbacked
> > 0x008880 936 3 _______S___m___o____ slab,mmap,compound
> > 0x000880 1602 6 _______S___m________ slab,mmap
>
> please don't display mmap and coumpound. it expose SLUB implentation detail.
> IOW, if slab flag on, please ignore following flags and mapcount.
> - PG_active
> - PG_error
> - PG_private
> - PG_compound
>
> BTW, if the page don't have PG_lru, following member and flags can be used another meanings.
> - PG_active
> - PG_referenced
> - page::_mapcount
> - PG_swapbacked
> - PG_reclaim
> - PG_unevictable
> - PG_mlocked
>
> and, if the page never interact IO layer, following flags can be used another meanings.
> - PG_uptodate
> - PG_dirty

Good point. I also noticed many of these conditional flags.
The perceived solution would be to do some filtering if
!CONFIG_DEBUG_KERNEL, to not confuse too many administrators.
For kernel developers we want to be faithful :-)

>
> > 0x0088c0 59 0 ______AS___m___o____ active,slab,mmap,compound
> > 0x0008c0 49 0 ______AS___m________ active,slab,mmap
> > total 513968 2007
>
>
> And, PageAnon() result seems provide good information if the page stay in lru.

Good point! Will add this bit.

> > # ./page-areas 0x004000
> > offset len KB
> > 0 15 60KB
> > 31 4 16KB
> > 159 97 388KB
> > 4096 2213 8852KB
> > 6899 2385 9540KB
> > 9497 3 12KB
> > 9728 14528 58112KB
> >
> > > > > > > - PG_private
> > > > > > > - PG_private_2
> > > > > > > - PG_owner_priv_1
> > > > > > >
> > > > > > > - PG_head
> > > > > > > - PG_tail
> > > > > > > - PG_compound
> > >
> > > I would combine these three into a pseudo "large page" flag.
> >
> > Very neat idea! Patch updated accordingly.
> >
> > However - one pity I observed:
> >
> > # ./page-areas 0x008000
> > offset len KB
> > 3088 4 16KB
> >
> > We can no longer tell if the above line means one 4-page hugepage, or two
> > 2-page hugepages... Adding PG_COMPOUND_TAIL into the CONFIG_DEBUG_KERNEL block
> > can help kernel developers. Or will it be ever cared by administrators?
> >
> > 341196 2 8KB
> > 341202 2 8KB
> > 341262 2 8KB
> > 341272 8 32KB
> > 341296 8 32KB
> > 488448 24 96KB
> > 488490 2 8KB
> > 488496 320 1280KB
> > 488842 2 8KB
> > 488848 40 160KB
> >
> > > > > > >
> > > > > > > - PG_unevictable
> > > > > > > - PG_mlocked
> > > > > > >
> > > > > > > - PG_poison
> > >
> > > PG_poison is also useful to export. But since it depends on my
> > > patchkit I will pull a patch for that into the HWPOISON series.
> >
> > That's not a problem - since the PG_poison line is be protected by
> > #ifdef CONFIG_MEMORY_FAILURE :-)
> >
> > > > > > > - PG_unevictable
> > > > > > > - PG_mlocked
> > > >
> > > > this 9 flags shouldn't exported.
> > > > I can't imazine administrator use what purpose those flags.
> > >
> > > I think an abstraced "PG_pinned" or somesuch flag that combines
> > > page lock, unevictable, mlocked would be useful for the administrator.
> >
> > The PG_PINNED abstraction risks hiding useful information.
> > The administrator may not only care about the pinned pages,
> > but also care _why_ they are pinned, i.e. ramfs.. or mlock?
> >
> > So it might be good to export them as is, with proper document.
> >
> > Here is the v2 patch, with flags for kernel hackers numbered from 32.
> > Comments are welcome!
>
> if you can write good document, PG_unevictable is exportable.
> but PG_mlock isn't.
>
> that's implementation tecknique of efficient unevictable pages for mlock.
> we can change the future.

Yup. That's in line with my vague feeling. For PG_unevictable we can
say that the page is owned by the unevictable (non-)lru and not a
candidate for LRU page reclaims. But for PG_mlock it's more about an
assistant for kernel optimizations and there are no guarantees...

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/