Re: [PATCH v7 1/1] mm: report per-page metadata information

From: Yosry Ahmed
Date: Tue Jan 30 2024 - 12:10:04 EST


On Mon, Jan 29, 2024 at 02:52:23PM -0800, Greg KH wrote:
> On Mon, Jan 29, 2024 at 02:42:04PM -0800, Sourav Panda wrote:
> > Adds two new per-node fields, namely nr_page_metadata and
> > nr_page_metadata_boot, to /sys/devices/system/node/nodeN/vmstat
> > and a global PageMetadata field to /proc/meminfo. This information can
> > be used by users to see how much memory is being used by per-page
> > metadata, which can vary depending on build configuration, machine
> > architecture, and system use.
> >
> > Per-page metadata is the amount of memory that Linux needs in order to
> > manage memory at the page granularity. The majority of such memory is
> > used by "struct page" and "page_ext" data structures. In contrast to
> > most other memory consumption statistics, per-page metadata might not
> > be included in MemTotal. For example, MemTotal does not include memblock
> > allocations but includes buddy allocations. In this patch, exported
> > field nr_page_metadata in /sys/devices/system/node/nodeN/vmstat would
> > exclusively track buddy allocations while nr_page_metadata_boot would
> > exclusively track memblock allocations. Furthermore, PageMetadata in
> > /proc/meminfo would exclusively track buddy allocations allowing it to
> > be compared against MemTotal.
> >
> > This memory depends on build configurations, machine architectures, and
> > the way system is used:
> >
> > Build configuration may include extra fields into "struct page",
> > and enable / disable "page_ext"
> > Machine architecture defines base page sizes. For example 4K x86,
> > 8K SPARC, 64K ARM64 (optionally), etc. The per-page metadata
> > overhead is smaller on machines with larger page sizes.
> > System use can change per-page overhead by using vmemmap
> > optimizations with hugetlb pages, and emulated pmem devdax pages.
> > Also, boot parameters can determine whether page_ext is needed
> > to be allocated. This memory can be part of MemTotal or be outside
> > MemTotal depending on whether the memory was hot-plugged, booted with,
> > or hugetlb memory was returned back to the system.
> >
> > Suggested-by: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx>
> > Signed-off-by: Sourav Panda <souravpanda@xxxxxxxxxx>
> > ---
> > Documentation/filesystems/proc.rst | 3 +++
> > fs/proc/meminfo.c | 4 ++++
> > include/linux/mmzone.h | 4 ++++
> > include/linux/vmstat.h | 4 ++++
> > mm/hugetlb_vmemmap.c | 19 ++++++++++++++----
> > mm/mm_init.c | 3 +++
> > mm/page_alloc.c | 1 +
> > mm/page_ext.c | 32 +++++++++++++++++++++---------
> > mm/sparse-vmemmap.c | 8 ++++++++
> > mm/sparse.c | 7 ++++++-
> > mm/vmstat.c | 26 +++++++++++++++++++++++-
> > 11 files changed, 96 insertions(+), 15 deletions(-)
> >
> > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > index 49ef12df631b..d5901d04e082 100644
> > --- a/Documentation/filesystems/proc.rst
> > +++ b/Documentation/filesystems/proc.rst
> > @@ -993,6 +993,7 @@ Example output. You may not have all of these fields.
> > AnonPages: 4654780 kB
> > Mapped: 266244 kB
> > Shmem: 9976 kB
> > + PageMetadata: 513419 kB
> > KReclaimable: 517708 kB
> > Slab: 660044 kB
> > SReclaimable: 517708 kB
>
> Why are you adding it to the middle of the file? Are you sure the
> userspace tools that parse this file today can handle an unknown field
> here, and not just at the end of the file?

FWIW, looking at git blame for fs/proc/meminfo.c, it seems like people
have generally been adding items where it makes sense semantically, not
at the end of the file. So maybe that's okay for userspace tools.