Re: [PATCH v5 1/1] mm: report per-page metadata information

From: David Hildenbrand
Date: Thu Nov 02 2023 - 12:59:16 EST


On 02.11.23 17:43, Pasha Tatashin wrote:
On Thu, Nov 2, 2023 at 12:09 PM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 02.11.23 17:02, Pasha Tatashin wrote:
On Thu, Nov 2, 2023 at 11:53 AM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 02.11.23 16:50, Pasha Tatashin wrote:
Adding reserved memory to MemTotal is a cleaner approach IMO as well.
But it changes the semantics of MemTotal, which may have compatibility
issues.

I object.

Could you please elaborate what you object (and why): you object that
it will have compatibility issues, or you object to include memblock
reserves into MemTotal?

Sorry, I object to changing the semantics of MemTotal. MemTotal is
traditionally the memory managed by the buddy, not all memory in the
system. I know people/scripts that are relying on that [although it's
been source of confusion a couple of times].

What if one day we change so that struct pages are allocated from
buddy allocator (i.e. allocate deferred struct pages from buddy) will

It does on memory hotplug. But for things like crashkernel size
detection doesn't really care about that.

"Crash kernel" is a different case: it is kernel external memory,
similar to limiting the amount of physical memory via mem=/memmap=, it
sets memory that cannot be used by this kernel, but only by the crash
kernel. Also, the crash kernel reserve is exposed in /proc/iomem via
"Crash kernel" range.

Agreed.


Page metadata memory on the other hand, is used by this kernel, and
also can be changed by this kernel depending on how the memory is
used: memdec, hotplug, THP, emulated pmem etc.

And then, there is the "altmap" for dax, where the metadata is placed on the dax memory itself. I mean, it's system RAM (or NVDIMM or whatever) used for metadata, but not managed by the buddy.

There is now also the "memmap_on_memory" feature for memory hotplug, where we do the same for ordinary hotplug memory (but some memory aside for the memmap and not allocate it from the buddy). We'd have to account that one as well as metadata, I think. I don't think it would get accounted under MemTotal (because, not managed by the buddy) as of now.


it break those MemTotal scripts? What if the size of struct pages
changes significantly, but the overhead will come from other metadata
(i.e. memdesc) will that break those scripts? I feel like struct page

Probably; but ideally the metadata overhead will be smaller with
memdesc. And we'll talk about that once it gets real ;)

The size and allocation of struct pages change MemTotal today, during
runtime, even without memdesc, I just brought it up, to emphasize that
this is something that we should resolve now before it gets worse.

I don't quite see the immediate need for action, but I get what you are saying. It's a historical mess, but if we want to tackle it, we should tackle it completely and not only sort out the metadata accounting.


memory should really be included into MemTotal, otherwise we will have
this struggle in the future when we try to optimize struct page
memory.
How far do we want to go, do we want to include crashkernel reserved
memory in MemTotal because it is system memory? Only metadata? what else
allocated using memblock?

Again, right now it's simple: MemTotal is memory managed by the buddy.

The spirit of this patch set is good, modifying existing counters needs
good justification.

Wei, noticed that all other fields in /proc/meminfo are part of
MemTotal, but this new field may be not (depending where struct pages

I could have sworn that I pointed that out in a previous version and requested to document that special case in the patch description. :)

are allocated), so what would be the best way to export page metadata
without redefining MemTotal? Keep the new field in /proc/meminfo but
be ok that it is not part of MemTotal or do two counters? If we do two
counters, we will still need to keep one that is a buddy allocator in
/proc/meminfo and the other one somewhere outside?

IMHO, we should just leave MemTotal alone ("memory managed by the buddy that could actually mostly get freed up and reused -- although that's not completely true") and have a new counter that includes any system memory (MemSystem? but as we learned, as separate files), including most memblock allocations/reservations as well (metadata, early pagetables, initrd, kernel, ...).

The you would actually know how much memory the system is using (exclusing things like crashmem, mem=, ...).

That part is tricky, though -- I recall there are memblock reservations that are similar to the crashkernel -- which is why the current state is to account memory when it's handed to the buddy under MemTotal -- which is straight forward and simply.

I'm happy to discuss this further, if that direction is worth exploring.

--
Cheers,

David / dhildenb