Re: [PATCH 1/2] mm: add MemAvailable to per-node meminfo

From: Qi Zheng
Date: Thu Dec 16 2021 - 10:31:43 EST




On 12/16/21 9:16 PM, Greg KH wrote:
On Thu, Dec 16, 2021 at 08:46:54PM +0800, Qi Zheng wrote:
In /proc/meminfo, we can show the sum of all the available memory
as "MemAvailable". Add the same counter also to per-node meminfo
under /sys.

With this counter, some processes that bind nodes can make some
decisions by reading the "MemAvailable" of the corresponding nodes
directly.

Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
---
drivers/base/node.c | 4 ++++
include/linux/mm.h | 1 +
include/linux/mmzone.h | 5 +++++
mm/page_alloc.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 59 insertions(+)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 87acc47e8951..deb2a7965ae4 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -375,8 +375,10 @@ static ssize_t node_read_meminfo(struct device *dev,
struct sysinfo i;
unsigned long sreclaimable, sunreclaimable;
unsigned long swapcached = 0;
+ long available;
si_meminfo_node(&i, nid);
+ available = si_mem_available_node(&i, nid);
sreclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B);
sunreclaimable = node_page_state_pages(pgdat, NR_SLAB_UNRECLAIMABLE_B);
#ifdef CONFIG_SWAP
@@ -386,6 +388,7 @@ static ssize_t node_read_meminfo(struct device *dev,
"Node %d MemTotal: %8lu kB\n"
"Node %d MemFree: %8lu kB\n"
"Node %d MemUsed: %8lu kB\n"
+ "Node %d MemAvailable: %8lu kB\n"

You just changed a user/kernel api without documenting it anywhere, or
ensuring that you did not just break anything.

Hi greg k-h,

The MemAvailable has long existed in the /proc/meminfo, it's meaning
has been described in the Documentation/filesystems/proc.rst. Since
the semantics of per-node MemAvailable has not changed, so I did not
add a new document description.


Also, this api is crazy, and not ok, please never add anything new to
it, it is broken as-is.

The consideration of adding per-node MemAvailable is that some processes
that bind nodes need this information to do some decisions.

Now their approach is to read other information in per-node meminfo
and /proc/sys/vm/watermark_scale_factor, and then approximate this
value. With this counter, they can directly read
/sys/devices/system/node/node*/meminfo to get the MemAvailable
information of each node.

And MemTotal, MemFree and SReclaimable(etc.) all have corresponding
per-node versions, so I think that adding per-node MemAvailable might
also make sense. :)

Thanks,
Qi


thanks,

greg k-h


--
Thanks,
Qi