Re: [PATCH v5 1/1] mm: report per-page metadata information

From: Wei Xu
Date: Thu Nov 02 2023 - 14:07:47 EST


On Thu, Nov 2, 2023 at 10:12 AM Pasha Tatashin
<pasha.tatashin@xxxxxxxxxx> wrote:
>
> > > Wei, noticed that all other fields in /proc/meminfo are part of
> > > MemTotal, but this new field may be not (depending where struct pages
> >
> > I could have sworn that I pointed that out in a previous version and
> > requested to document that special case in the patch description. :)
>
> Sounds, good we will document that parts of per-page may not be part
> of MemTotal.

But this still doesn't answer how we can use the new PageMetadata
field to help break down the runtime kernel overhead within MemUsed
(MemTotal - MemFree).

> > > are allocated), so what would be the best way to export page metadata
> > > without redefining MemTotal? Keep the new field in /proc/meminfo but
> > > be ok that it is not part of MemTotal or do two counters? If we do two
> > > counters, we will still need to keep one that is a buddy allocator in
> > > /proc/meminfo and the other one somewhere outside?
> >

I think the simplest thing to do now is to only report the buddy
allocations of per-page metadata in meminfo. The meaning of the new
counter is easier to understand and consistent with MemTotal and other
fields in meminfo. Its implementation can also be greatly simplified
and we don't need to handle the other special cases, either, e.g.
pagemeta allocated from DAX devices.

> > IMHO, we should just leave MemTotal alone ("memory managed by the buddy
> > that could actually mostly get freed up and reused -- although that's
> > not completely true") and have a new counter that includes any system
> > memory (MemSystem? but as we learned, as separate files), including most
> > memblock allocations/reservations as well (metadata, early pagetables,
> > initrd, kernel, ...).
> >
> > The you would actually know how much memory the system is using
> > (exclusing things like crashmem, mem=, ...).
> >
> > That part is tricky, though -- I recall there are memblock reservations
> > that are similar to the crashkernel -- which is why the current state is
> > to account memory when it's handed to the buddy under MemTotal -- which
> > is straight forward and simply.
>
> It may be simplified if we define MemSystem as all the usable memory
> provided by firmware to Linux kernel.
> For BIOS it would be the "usable" ranges in the original e820 memory
> list before it's been modified by the kernel based on the parameters.
>
> For device-tree architectures, it would be the memory binding provided
> by the original device tree from the firmware.
>
> Pasha